How can get anova main-effects with dummy coding? (Stata version 10 and earlier)

Many researchers like to do their anova using regression with dummy coding but find it confusing when they don’t get the same main-effects as in anova. This FAQ will show you how to get those main-effects.

Let’s begin by showing the normal anova using a dataset called crf24 to use as a comparison.

use https://stats.idre.ucla.edu/stat/stata/faq/crf24, clear

anova y a b a*b

                           Number of obs =      32     R-squared     =  0.9214
                           Root MSE      = .877971     Adj R-squared =  0.8985

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |         217     7          31      40.22     0.0000
                         |
                       a |       3.125     1       3.125       4.05     0.0554
                       b |       194.5     3  64.8333333      84.11     0.0000
                     a*b |      19.375     3  6.45833333       8.38     0.0006
                         |
                Residual |        18.5    24  .770833333   
              -----------+----------------------------------------------------
                   Total |       235.5    31  7.59677419

Next, we will manually compute the various dummy variables and run the regression model.

tab a, gen(a)
tab b, gen(b)
generate ab1 = a1*b1
generate ab2 = a1*b2
generate ab3 = a1*b3

regress y a1 b1 b2 b3 ab1 ab2 ab3

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  7,    24) =   40.22
       Model |         217     7          31           Prob > F      =  0.0000
    Residual |        18.5    24  .770833333           R-squared     =  0.9214
-------------+------------------------------           Adj R-squared =  0.8985
       Total |       235.5    31  7.59677419           Root MSE      =  .87797

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          a1 |         -2   .6208194    -3.22   0.004    -3.281308   -.7186918
          b1 |      -8.25   .6208194   -13.29   0.000    -9.531308   -6.968692
          b2 |         -7   .6208194   -11.28   0.000    -8.281308   -5.718692
          b3 |       -4.5   .6208194    -7.25   0.000    -5.781308   -3.218692
         ab1 |          4   .8779711     4.56   0.000     2.187957    5.812043
         ab2 |          3   .8779711     3.42   0.002     1.187957    4.812043
         ab3 |        3.5   .8779711     3.99   0.001     1.687957    5.312043
       _cons |         10   .4389856    22.78   0.000     9.093978    10.90602
------------------------------------------------------------------------------

For this model a2 is the reference level for a and b4 is the reference level for b, i.e., they are the omitted levels.

Here is the test of the a*b interaction.

test ab1 ab2 ab3

 ( 1)  ab1 = 0
 ( 2)  ab2 = 0
 ( 3)  ab3 = 0

       F(  3,    24) =    8.38
            Prob > F =    0.0006

To get the main-effect for a we will use the dummy for a plus the a*b interaction dummies averaged across the four levels of b.

test a1 + (ab1+ab2+ab3)/4 = 0

 ( 1)  a1 + .25 ab1 + .25 ab2 + .25 ab3 = 0

       F(  1,    24) =    4.05
            Prob > F =    0.0554

The main-effect for b is a little bit trickier because it is a 3 degree of freedom test so we will have to do the test command three times and make use of the accumulate option.

test b1 + ab1/2 = 0

 ( 1)  b1 + .5 ab1 = 0

       F(  1,    24) =  202.70
            Prob > F =    0.0000

test b2 + ab2/2 = 0, accumulate

 ( 1)  b1 + .5 ab1 = 0
 ( 2)  b2 + .5 ab2 = 0

       F(  2,    24) =  120.86
            Prob > F =    0.0000

test b3 + ab3/2 = 0, accumulate

 ( 1)  b1 + .5 ab1 = 0
 ( 2)  b2 + .5 ab2 = 0
 ( 3)  b3 + .5 ab3 = 0

       F(  3,    24) =   84.11
            Prob > F =    0.0000

The last test command has our main-effect for b

So, what’s with all of the division, by 4 in the a main-effect and by 2 in the b main-effect. The dummy variable a1 is actually the simple effect of a. To get the “true” main-effect of a we have to combine the simple effect of a with the average of the interaction effects across the four levels of b. Likewise, for the b main-effect we need to combine the simple main-effects of the levels of b with the average interaction effect across the two levels of a.

Example 2

This method generalizes to more complex designs with multiple factors so let’s consider a 3-factor completely crossed design.

use https://stats.idre.ucla.edu/stat/stata/faq/threeway, clear

anova y a b c a*b a*c b*c a*b*c

                           Number of obs =      24     R-squared     =  0.9689
                           Root MSE      =  1.1547     Adj R-squared =  0.9403

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  497.833333    11  45.2575758      33.94     0.0000
                         |
                       a |         150     1         150     112.50     0.0000
                       b |  .666666667     1  .666666667       0.50     0.4930
                       c |  127.583333     2  63.7916667      47.84     0.0000
                     a*b |  160.166667     1  160.166667     120.13     0.0000
                     a*c |       18.25     2       9.125       6.84     0.0104
                     b*c |  22.5833333     2  11.2916667       8.47     0.0051
                   a*b*c |  18.5833333     2  9.29166667       6.97     0.0098
                         |
                Residual |          16    12  1.33333333   
              -----------+----------------------------------------------------
                   Total |  513.833333    23  22.3405797

Once again we will manually create the dummy variables and run the regression model.

recode a (1=0)(2=1)
recode b (1=0)(2=1)
tab c, gen(c)
gen ab=a*b
gen ac1=a*c1
gen ac2=a*c2
gen bc1=b*c1
gen bc2=b*c2
gen abc1=a*b*c1
gen abc2=a*b*c2

regress y a b c1 c2 ab ac1 ac2 bc1 bc2 abc1 abc2

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F( 11,    12) =   33.94
       Model |  497.833333    11  45.2575758           Prob > F      =  0.0000
    Residual |          16    12  1.33333333           R-squared     =  0.9689
-------------+------------------------------           Adj R-squared =  0.9403
       Total |  513.833333    23  22.3405797           Root MSE      =  1.1547

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |        -.5   1.154701    -0.43   0.673    -3.015876    2.015876
           b |       -9.5   1.154701    -8.23   0.000    -12.01588   -6.984124
          c1 |         -8   1.154701    -6.93   0.000    -10.51588   -5.484124
          c2 |         -4   1.154701    -3.46   0.005    -6.515876   -1.484124
          ab |         15   1.632993     9.19   0.000     11.44201    18.55799
         ac1 |   6.39e-14   1.632993     0.00   1.000    -3.557986    3.557986
         ac2 |          1   1.632993     0.61   0.552    -2.557986    4.557986
         bc1 |          9   1.632993     5.51   0.000     5.442014    12.55799
         bc2 |          5   1.632993     3.06   0.010     1.442014    8.557986
        abc1 |       -8.5   2.309401    -3.68   0.003    -13.53175   -3.468247
        abc2 |       -5.5   2.309401    -2.38   0.035    -10.53175   -.4682473
       _cons |         19   .8164966    23.27   0.000     17.22101    20.77899
------------------------------------------------------------------------------

Here is the test of the three-way a*b*c interaction.

test abc1 abc2

 ( 1)  abc1 = 0
 ( 2)  abc2 = 0

       F(  2,    12) =    6.97
            Prob > F =    0.0098

Next come the two-way interactions with both a*c and b*c using the accumulate options.

/* a*b interaction */

test ab + (abc1+abc2)/3 = 0

 ( 1)  ab + .3333333 abc1 + .3333333 abc2 = 0

       F(  1,    12) =  120.13
            Prob > F =    0.0000
 
/* a*c interaction) */

test ac1 + abc1/2 = 0

 ( 1)  ac1 + .5 abc1 = 0

       F(  1,    12) =   13.55
            Prob > F =    0.0031

test ac2 + abc2/2 = 0, accumulate

 ( 1)  ac1 + .5 abc1 = 0
 ( 2)  ac2 + .5 abc2 = 0

       F(  2,    12) =    6.84
            Prob > F =    0.0104

/* b*c interaction */

test bc1 + abc1/2 = 0

 ( 1)  bc1 + .5 abc1 = 0

       F(  1,    12) =   16.92
            Prob > F =    0.0014

test bc2 + abc2/2 = 0, accumulate

 ( 1)  bc1 + .5 abc1 = 0
 ( 2)  bc2 + .5 abc2 = 0

       F(  2,    12) =    8.47
            Prob > F =    0.0051

Finally, we get to the main-effects.

/* a main-effect */

test a + ab/2 + (ac1+ac2)/3 + (abc1+abc2)/6 = 0

 ( 1)  a + .5 ab + .3333333 ac1 + .3333333 ac2 + .1666667 abc1 + .1666667 abc2 = 0

       F(  1,    12) =  112.50
            Prob > F =    0.0000

/* b main-effect */

test b + ab/2 + (bc1+bc2)/3 + (abc1+abc2)/6 = 0

 ( 1)  b + .5 ab + .3333333 bc1 + .3333333 bc2 + .1666667 abc1 + .1666667 abc2 = 0

       F(  1,    12) =    0.50
            Prob > F =    0.4930

/* c main-effect */

test c1 + ac1/2 + bc1/2 + abc1/4 = 0

 ( 1)  c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0

       F(  1,    12) =   94.92
            Prob > F =    0.0000

test c2 + ac2/2 + bc2/2 + abc2/4 = 0, accumulate

 ( 1)  c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0
 ( 2)  c2 + .5 ac2 + .5 bc2 + .25 abc2 = 0

       F(  2,    12) =   47.84
            Prob > F =    0.0000