Many researchers like to do their anova using regression with dummy coding but find it confusing when they don’t get the same main-effects as in anova. This FAQ will show you how to get those main-effects.
Let’s begin by showing the normal anova using a dataset called crf24 to use as a comparison.
use https://stats.idre.ucla.edu/stat/stata/faq/crf24, clear
anova y a b a*b
Number of obs = 32 R-squared = 0.9214
Root MSE = .877971 Adj R-squared = 0.8985
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 217 7 31 40.22 0.0000
|
a | 3.125 1 3.125 4.05 0.0554
b | 194.5 3 64.8333333 84.11 0.0000
a*b | 19.375 3 6.45833333 8.38 0.0006
|
Residual | 18.5 24 .770833333
-----------+----------------------------------------------------
Total | 235.5 31 7.59677419
Next, we will manually compute the various dummy variables and run the regression model.
tab a, gen(a)
tab b, gen(b)
generate ab1 = a1*b1
generate ab2 = a1*b2
generate ab3 = a1*b3
regress y a1 b1 b2 b3 ab1 ab2 ab3
Source | SS df MS Number of obs = 32
-------------+------------------------------ F( 7, 24) = 40.22
Model | 217 7 31 Prob > F = 0.0000
Residual | 18.5 24 .770833333 R-squared = 0.9214
-------------+------------------------------ Adj R-squared = 0.8985
Total | 235.5 31 7.59677419 Root MSE = .87797
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a1 | -2 .6208194 -3.22 0.004 -3.281308 -.7186918
b1 | -8.25 .6208194 -13.29 0.000 -9.531308 -6.968692
b2 | -7 .6208194 -11.28 0.000 -8.281308 -5.718692
b3 | -4.5 .6208194 -7.25 0.000 -5.781308 -3.218692
ab1 | 4 .8779711 4.56 0.000 2.187957 5.812043
ab2 | 3 .8779711 3.42 0.002 1.187957 4.812043
ab3 | 3.5 .8779711 3.99 0.001 1.687957 5.312043
_cons | 10 .4389856 22.78 0.000 9.093978 10.90602
------------------------------------------------------------------------------
For this model a2 is the reference level for a and b4 is the reference level for b, i.e., they are the omitted levels.
Here is the test of the a*b interaction.
test ab1 ab2 ab3
( 1) ab1 = 0
( 2) ab2 = 0
( 3) ab3 = 0
F( 3, 24) = 8.38
Prob > F = 0.0006
To get the main-effect for a we will use the dummy for a plus the a*b interaction dummies averaged across the four levels of b.
test a1 + (ab1+ab2+ab3)/4 = 0
( 1) a1 + .25 ab1 + .25 ab2 + .25 ab3 = 0
F( 1, 24) = 4.05
Prob > F = 0.0554
The main-effect for b is a little bit trickier because it is a 3 degree of freedom test so we will have to do the test command three times and make use of the accumulate option.
test b1 + ab1/2 = 0
( 1) b1 + .5 ab1 = 0
F( 1, 24) = 202.70
Prob > F = 0.0000
test b2 + ab2/2 = 0, accumulate
( 1) b1 + .5 ab1 = 0
( 2) b2 + .5 ab2 = 0
F( 2, 24) = 120.86
Prob > F = 0.0000
test b3 + ab3/2 = 0, accumulate
( 1) b1 + .5 ab1 = 0
( 2) b2 + .5 ab2 = 0
( 3) b3 + .5 ab3 = 0
F( 3, 24) = 84.11
Prob > F = 0.0000
The last test command has our main-effect for b
So, what’s with all of the division, by 4 in the a main-effect and by 2 in the b main-effect. The dummy variable a1 is actually the simple effect of a. To get the “true” main-effect of a we have to combine the simple effect of a with the average of the interaction effects across the four levels of b. Likewise, for the b main-effect we need to combine the simple main-effects of the levels of b with the average interaction effect across the two levels of a.
Example 2
This method generalizes to more complex designs with multiple factors so let’s consider a 3-factor completely crossed design.
use https://stats.idre.ucla.edu/stat/stata/faq/threeway, clear
anova y a b c a*b a*c b*c a*b*c
Number of obs = 24 R-squared = 0.9689
Root MSE = 1.1547 Adj R-squared = 0.9403
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 497.833333 11 45.2575758 33.94 0.0000
|
a | 150 1 150 112.50 0.0000
b | .666666667 1 .666666667 0.50 0.4930
c | 127.583333 2 63.7916667 47.84 0.0000
a*b | 160.166667 1 160.166667 120.13 0.0000
a*c | 18.25 2 9.125 6.84 0.0104
b*c | 22.5833333 2 11.2916667 8.47 0.0051
a*b*c | 18.5833333 2 9.29166667 6.97 0.0098
|
Residual | 16 12 1.33333333
-----------+----------------------------------------------------
Total | 513.833333 23 22.3405797
Once again we will manually create the dummy variables and run the regression model.
recode a (1=0)(2=1)
recode b (1=0)(2=1)
tab c, gen(c)
gen ab=a*b
gen ac1=a*c1
gen ac2=a*c2
gen bc1=b*c1
gen bc2=b*c2
gen abc1=a*b*c1
gen abc2=a*b*c2
regress y a b c1 c2 ab ac1 ac2 bc1 bc2 abc1 abc2
Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 11, 12) = 33.94
Model | 497.833333 11 45.2575758 Prob > F = 0.0000
Residual | 16 12 1.33333333 R-squared = 0.9689
-------------+------------------------------ Adj R-squared = 0.9403
Total | 513.833333 23 22.3405797 Root MSE = 1.1547
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | -.5 1.154701 -0.43 0.673 -3.015876 2.015876
b | -9.5 1.154701 -8.23 0.000 -12.01588 -6.984124
c1 | -8 1.154701 -6.93 0.000 -10.51588 -5.484124
c2 | -4 1.154701 -3.46 0.005 -6.515876 -1.484124
ab | 15 1.632993 9.19 0.000 11.44201 18.55799
ac1 | 6.39e-14 1.632993 0.00 1.000 -3.557986 3.557986
ac2 | 1 1.632993 0.61 0.552 -2.557986 4.557986
bc1 | 9 1.632993 5.51 0.000 5.442014 12.55799
bc2 | 5 1.632993 3.06 0.010 1.442014 8.557986
abc1 | -8.5 2.309401 -3.68 0.003 -13.53175 -3.468247
abc2 | -5.5 2.309401 -2.38 0.035 -10.53175 -.4682473
_cons | 19 .8164966 23.27 0.000 17.22101 20.77899
------------------------------------------------------------------------------
Here is the test of the three-way a*b*c interaction.
test abc1 abc2
( 1) abc1 = 0
( 2) abc2 = 0
F( 2, 12) = 6.97
Prob > F = 0.0098
Next come the two-way interactions with both a*c and b*c using the accumulate options.
/* a*b interaction */
test ab + (abc1+abc2)/3 = 0
( 1) ab + .3333333 abc1 + .3333333 abc2 = 0
F( 1, 12) = 120.13
Prob > F = 0.0000
/* a*c interaction) */
test ac1 + abc1/2 = 0
( 1) ac1 + .5 abc1 = 0
F( 1, 12) = 13.55
Prob > F = 0.0031
test ac2 + abc2/2 = 0, accumulate
( 1) ac1 + .5 abc1 = 0
( 2) ac2 + .5 abc2 = 0
F( 2, 12) = 6.84
Prob > F = 0.0104
/* b*c interaction */
test bc1 + abc1/2 = 0
( 1) bc1 + .5 abc1 = 0
F( 1, 12) = 16.92
Prob > F = 0.0014
test bc2 + abc2/2 = 0, accumulate
( 1) bc1 + .5 abc1 = 0
( 2) bc2 + .5 abc2 = 0
F( 2, 12) = 8.47
Prob > F = 0.0051
Finally, we get to the main-effects.
/* a main-effect */
test a + ab/2 + (ac1+ac2)/3 + (abc1+abc2)/6 = 0
( 1) a + .5 ab + .3333333 ac1 + .3333333 ac2 + .1666667 abc1 + .1666667 abc2 = 0
F( 1, 12) = 112.50
Prob > F = 0.0000
/* b main-effect */
test b + ab/2 + (bc1+bc2)/3 + (abc1+abc2)/6 = 0
( 1) b + .5 ab + .3333333 bc1 + .3333333 bc2 + .1666667 abc1 + .1666667 abc2 = 0
F( 1, 12) = 0.50
Prob > F = 0.4930
/* c main-effect */
test c1 + ac1/2 + bc1/2 + abc1/4 = 0
( 1) c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0
F( 1, 12) = 94.92
Prob > F = 0.0000
test c2 + ac2/2 + bc2/2 + abc2/4 = 0, accumulate
( 1) c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0
( 2) c2 + .5 ac2 + .5 bc2 + .25 abc2 = 0
F( 2, 12) = 47.84
Prob > F = 0.0000
