Monte Carlo simulation can provide a useful method of assessing the power of a factorial anova design. There are many different ways that one could generate The approach that we will take is to create a dataset that summarizes the anova design at the cell level. The dataset in this example is taken from a pilot study. Here is the dataset for a 2×3 factorial design that we will use for our example.
input a b n m s 1 1 7 21.5 8.6 1 2 9 27.2 8.6 1 3 9 19.9 8 2 1 6 20.2 14.2 2 2 10 24.3 10.3 2 3 9 28.6 11 end save simdat
The variables in the dataset include a and b, the names of the two factor variables, and n, m and s which are the cell sample sizes, means and standard deviations, respectively. Please note that you must use these particular variable names.
We will graph these cell means to give you an idea of what the design looks like.
twoway (line m b if a==1)(line m b if a==2), /// ylabel(10(5)40) xlabel(1 2 3) legend(order(1 "a=1" 2 "a=2")) /// title(Plot of Cell Means) ytitle(mean) scheme(lean1)
The basic strategy for our simulation will be to draw hundreds or thousands of random samples from a normal distribution which have the same cell sizes, means and standard deviations as our sample dataset. Run an anova on each of our simulated samples and compute the proportion of times the p-value is equal to or less than your alpha level. Before we write a program to do this simulation, let’s manually generate a single random sample and run an anova.
set seed 98765
expand n
generate y=rnormal(m,s)
anova y a##b
Number of obs = 428 R-squared = 0.0967
Root MSE = 10.018 Adj R-squared = 0.0860
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 4535.05773 5 907.011546 9.04 0.0000
|
a | 473.338874 1 473.338874 4.72 0.0304
b | 1560.70625 2 780.353126 7.78 0.0005
a#b | 2246.0151 2 1123.00755 11.19 0.0000
|
Residual | 42352.1088 422 100.360447
-----------+----------------------------------------------------
Total | 46887.1665 427 109.806011
As you can see, it is a pretty straight forward process. And from the results above it would appear that we have good power for each of the effects in the model. However, this is only one simulated sample, we need to do this many times over.
Now we can move on to our rclass program factorialsim.ado. The program file needs to be placed in a location where Stata can find it. We suggest in the ado/personal/ directory. Here is the program. We will discuss the aero option a bit later.
program factorialsim, rclass
syntax using [, zero ]
drop _all
use `using'
if "`zero'"!="" {
replace m=0
}
expand n
generate y=rnormal(m,s)
anova y a##b
return scalar pab = Ftail(e(df_3),e(df_r),e(F_3))
return scalar pa = Ftail(e(df_1),e(df_r),e(F_1))
return scalar pb = Ftail(e(df_2),e(df_r),e(F_2))
end
We call factorialsim from the simulate command which will keep track of of the results of each of the simulated runs and save the results to memory. We will use set seed with our example so that we can replicate our results. We will also create a global macro reps to store the number of simulated datasets we want. For demonstration purposes we will only use 200 replications. You will probably want to use more replications, perhaps many more.
global reps=200
set seed 8765432
simulate pab=r(pab) pa=r(pa) pb=r(pb), reps($reps): ///
factorialsim using simdat
command: factorialsim using `"simdat"'
pab: r(pab)
pa: r(pa)
pb: r(pb)
Simulations (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
(output omitted)
.................................................. 200
count if pab<=.05
60
display "simulated power for axb = " r(N)/$reps
simulated power for axb = .3
count if pa <=.05
19
display "simulated power for a = " r(N)/$reps
simulated power for a = .095
count if pb <=.05
47
display "simulated power for b = " r(N)/$reps
simulated power for b = .235
The results above indicate that the power for all three F-ratios is very low with the a by b interaction having the highest power of 0.3. Increasing the sample size will surely increase our power. Let’s try increasing the sample size for each cell by three times its current size.
use simdat, clear
replace n=3*n
save simdat2
set seed 12345
simulate pab=r(pab) pa=r(pa) pb=r(pb), reps($reps): ///
factorialsim using simdat2
command: factorialsim using `"simdat2"'
pab: r(pab)
pa: r(pa)
pb: r(pb)
Simulations (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
(output omitted
.................................................. 200
count if pab<=.05
164
display "simulated power for axb = " r(N)/$reps
simulated power for axb = .82
count if pa <=.05
38
display "simulated power for a = " r(N)/$reps
simulated power for a = .19
count if pb <=.05
103
display "simulated power for b = " r(N)/$reps
simulated power for b = .515
Multiplying the cell sizes by three has yielded a satisfactory power of 0.82 for the a by b interaction. You could have changed each of the cell sizes individually, you don’t have to multiply all the cell sizes by a constant. You can continue to change the cell sizes until you find values that give the power you are looking for.
So what’s the story with the zero option? If you set the cell means to a constant and run the simulations you can investigate the robustness of you design. If the proportion of p-values is close to your alpha level then your design is robust to the differences in variances in you model. The zero option will set all the cell means to zero. We will run the simulate command with our original dataset, simdat.dta.
set seed 56789
simulate pab=r(pab) pa=r(pa) pb=r(pb), reps($reps): ///
factorialsim using simdat, zero
command: factorialsim using `"simdat"' , zero
pab: r(pab)
pa: r(pa)
pb: r(pb)
Simulations (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
(output omitted)
.................................................. 200
count if pab<=.05
12
display "check axb robustness = " r(N)/$reps
check axb robustness = .06
count if pa <=.05
17
display "check a robustness = " r(N)/$reps
check a robustness = .085
count if pb <=.05
9
display "check b robustness = " r(N)/$reps
check b robustness = .045
The values for the robustness check fall within an acceptable range.
The simulation method shown here can easily be adapted to more complex designs and to non-normal distributions.

