Below are two examples of running simulations using Stata. Both examples involve running a regression. The difference between them is the way the data for the regression are generated. The simulation command repeats this 1000 times and records the coefficient estimates and their standard errors from each repetition.
In the first example, the two independent variables are from an existing dataset and the dependent variable is generated based on the two independent variables plus some random error. The dependent variable is then regressed on the two independent variables.
* Set up the steps you want to repeat for the simulation in a program program define myprog1 * drop all variables to create an empty dataset, do not use clear drop _all * get dataset use https://stats.idre.ucla.edu/stat/stata/faq/hsb2 * keep the independent variables (IVs) keep write math * gen dependent variable (DV) with set relationship to IVs + random error gen y = 7.541 + .3283*math + .5196*write + 7.281 * invnormal(uniform()) * run the desired command reg y write math end * use the simulate command to rerun myprog1 1000 times * collect the betas (_b) and standard errors (_se) from the regression each time * You'll probably want to set reps(10) for testing, then set it higher for the simulation. simulate _b _se, reps(1000): myprog1
The second example is similar to the first, except that the data are random draws from a normal distribution with a given correlational structure using the command drawnorm. Covariances can also be used by specifying the cov() option instead of corr(). If no correlation or covariance structure is specified, the variables generated will be orthogonal. The code below also specifies means and standard deviations for the variables, but this is not strictly necessary.
* Set up the steps you want to repeat for the simulation in a program program define myprog2 * drop all variables to create an empty dataset, do not use clear drop _all * create a vector that contains the equivalent of a lower triangular correlation matrix matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1) * create a vector that contains the means of the variables matrix m = (52.23,52.775,52.645) * create a vector that contains the standard deviations matrix sd = (10.25,9.47,9.36) * draw a sample of 1000 cases from a normal distribution with specified correlation structure * and specified means and standard deviations drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd) * run the desired command reg y x1 x2 end* use the simulate command to rerun myprog2 1000 times * collect the betas (_b) and standard errors (_se) from the regression each time * You'll probably want to set reps(10) for testing, then set it higher for the simulation. simulate _b _se, reps(1000): myprog2