Regression with Stata Chapter 6: More on interactions of categorical variables Draft version

This is a draft version of this chapter. Comments and suggestions to improve this draft are welcome.

Chapter outline
    6.1. Analysis with two categorical variables
    6.2. Simple effects
    6.2.1 Analyzing simple effects using xi3 and regress
      6.2.2 Coding of simple effects
    6.3. Simple comparisons
      6.3.1 Analyzing simple comparisons using xi3 and regress
     6.3.2 Coding of simple comparisons
    6.4. Partial interaction
     6.4.1 Analyzing partial interactions using xi3 and regress
     6.4.2 Coding of partial interactions
    6.5. Interaction contrasts
     6.5.1 Analyzing interaction contrasts using xi3 and regress
     6.5.2 Coding of interaction contrasts
    6.6. Computing adjusted means
    6.6.1 Computing adjusted means via anova
    6.6.1 Computing adjusted means via regress
    6.7. More details on meaning of coefficients
    6.8. Simple effects via dummy coding versus effect coding
    6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat
      6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd

Please note: This page makes use of the programs xi3 and postgr3 which are no longer being maintained and has been removed from our archives. References to xi3 and postgr3 will be left on this page because they illustrate specific principles of coding categorical variables.

For this chapter we will use the elemapi2 data file that we have been using in prior chapters. We will focus on the variables mealcat, and collcat as they relate to the outcome variable api00 (performance on the api in the year 2000). The variable mealcat is the variable meals broken up into three categories, and the variable collcat is the variable some_col broken into 3 categories. We could think of mealcat as being the number of students receiving free meals and broken up into low, middle and high. The variable collcat can be thought of as the number of parents with some college education, and we could think of it as being broken up into low, medium and high. For our analysis, we think that both mealcat and collcat may be related to api00, but it is also possible that the impact of mealcat might depend on the level of collcat. In other words, we think that there might be an interaction of these two categorical variables. In this chapter we will look at how these two categorical variables are related to api performance in the school, and we will look at the interaction of these two categorical variables as well. We will see that there is an interaction of these categorical variables, and will focus on different ways of further exploring the interaction.

We will first use the elemapi2 data file.

use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear

We will modify the label for mealcat in order to more clearly see some of the points we will be demonstrating later in this chapter.

label define mealcat 1 "1" 2 "2" 3 "3", modify

6.1. Analysis with 2 categorical variables

One traditional way to analyze this would be to perform a 3 by 3 factorial analysis of variance using the anova command, as shown below. The results show a main effect of collcat (F=4.5, p-0.0117), a main effect of mealcat (F=509.04, p=0.0000) and an interaction of collcat by mealcat, (F=6.63, p=0.0000).

anova api00 collcat mealcat collcat*mealcat

                           Number of obs =     400     R-squared     =  0.7733
                           Root MSE      =  68.412     Adj R-squared =  0.7687

                  Source |  Partial SS    df       MS           F     Prob > F
         ----------------+----------------------------------------------------
                   Model |  6243714.81     8  780464.351     166.76     0.0000
                         |
                 collcat |  42140.5662     2  21070.2831       4.50     0.0117
                 mealcat |  4764843.56     2  2382421.78     509.04     0.0000
         collcat*mealcat |  124167.809     4  31041.9522       6.63     0.0000
                         |
                Residual |  1829957.19   391  4680.19741   
         ----------------+----------------------------------------------------
                   Total |  8073672.00   399  20234.7669

We can use the adjust command to show the adjusted means broken down by collcat and mealcat.

adjust, by(collcat mealcat)

----------------------------------------------------------
     Dependent variable: api00     Command: anova
----------------------------------------------------------

-------------------------------------
          |Percentage free meals in 3
          |        categories        
  collcat |       1        2        3
----------+--------------------------
        1 | 816.914   589.35  493.919
        2 | 825.651  636.605  508.833
        3 | 782.151  655.638  541.733
-------------------------------------
     Key:  Linear Prediction

We can show a graph of the adjusted means as shown below. We use the separate command to make three variables corresponding to the three levels of collcat (i.e., yhat1 corresponds to the predicted value when collcat is low). We can then show the graph with the three levels of collcat represented as three separate lines.

predict yhat
separate yhat, by(collcat)

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
yhat1           float  %9.0g                  yhat, collcat == 1
yhat2           float  %9.0g                  yhat, collcat == 2
yhat3           float  %9.0g                  yhat, collcat == 3

graph twoway scatter yhat1 yhat2 yhat3 mealcat, connect(l l l) xlabel(1 2 3) sort

Now we drop the variables yhat yhat1 yhat2 yhat3 in case we wish to use these variables later.

drop yhat yhat1 yhat2 yhat3

We can do these same analyses using the regress command. Below we use the regress command with xi3 to look at the effect of collcat, mealcat and the interaction of these two variables.

xi3: regress api00 g.collcat*g.mealcat

. xi3: regress api00 g.collcat*g.mealcat
g.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_1 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_1 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Icollcat_2 |   23.63531   9.105331     2.60   0.010     5.733782    41.53685
 _Icollcat_3 |   26.44625   9.995129     2.65   0.008     6.795331    46.09717
 _Imealcat_2 |  -181.0414   9.077126   -19.94   0.000    -198.8874   -163.1953
 _Imealcat_3 |  -293.4103   9.449459   -31.05   0.000    -311.9884   -274.8322
   _Ico2Xme2 |   38.51777   24.19532     1.59   0.112    -9.051422    86.08697
   _Ico2Xme3 |   6.177537   20.08262     0.31   0.759     -33.3059    45.66097
   _Ico3Xme2 |    101.051   22.88808     4.42   0.000     56.05191    146.0501
   _Ico3Xme3 |   82.57776   24.43941     3.38   0.001     34.52867    130.6268
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

We use the test command to test the two terms associated with collcat to get the main effect of collcat.

test _Icollcat_2 _Icollcat_3

 ( 1)  _Icollcat_2 = 0.0
 ( 2)  _Icollcat_3 = 0.0

       F(  2,   391) =    4.50
            Prob > F =    0.0117

Likewise we use the test command to get the overall test of mealcat.

test _Imealcat_2 _Imealcat_3

 ( 1)  _Imealcat_2 = 0.0
 ( 2)  _Imealcat_3 = 0.0

       F(  2,   391) =  509.04
            Prob > F =    0.0000

Finally, we use the test command to test the interaction of of collcat by mealcat.

test _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3

 ( 1)  _Ico2Xme2 = 0
 ( 2)  _Ico2Xme3 = 0
 ( 3)  _Ico3Xme2 = 0
 ( 4)  _Ico3Xme3 = 0

       F(  4,   391) =    6.63
            Prob > F =    0.0000

First, note that the results of the test commands correspond to those from the anova command above. This is because collcat and mealcat were coded using simple effect coding, a coding scheme where the contrasts sum to 0. We indicated that we wanted simple effect coding by using g.collcat and g.mealcat on the regress command with xi3 (see Chapter 5 for more information about coding schemes available via the xi3 command). If this had been coded using dummy coding, e.g., i.collcat, then the results of the test commands for mealcat and somecat from the regress command would not have corresponded to the anova results. In addition to simple effect coding, we could have used e., h., r., a., b., or o. and the results of the test commands would have matched the anova command, although the meaning of the individual tests would have been different. This point will be explored in more detail later in this chapter.

We can obtain the adjusted means by using predict command to get the predicted values, calling them pred and then looking at the mean of pred broken down by collcat and mealcat.

predict pred
table collcat mealcat, contents(mean pred)

        Means, Standard Deviations and Frequencies of Fitted values

           |  Percentage free meals in 3
           |          categories
   collcat |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 | 816.91431  589.34998  493.91891 | 596.34884
         2 | 825.65118  636.60468  508.83334 | 651.50002
         3 | 782.15094   655.6377  541.73334 |  692.1095
-----------+---------------------------------+----------
     Total | 805.71757  639.39395  504.37956 | 647.62251

We can show a graph of cell means as shown below. We use the same strategy as we did in making the graph above.

separate pred, by(collcat)

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
pred1           float  %9.0g                  pred, collcat == 1
pred2           float  %9.0g                  pred, collcat == 2
pred3           float  %9.0g                  pred, collcat == 3

graph twoway scatter pred1 pred2 pred3 mealcat, c(l l l) xlabel(1 2 3) sort

Now we drop the variables pred pred1 pred2 pred3 in case we wish to use these variable names later.

drop pred pred1 pred2 pred3

Note that we could have produced the same graph and table of predicted values using the postgr3 command.

postgr3 mealcat, by(collcat) table2 clpattern(solid dash dot)

Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3 _IcolXmea_2_2 
  _IcolXmea_2_3 _IcolXmea_3_2 _IcolXmea_3_3
(option xb assumed; fitted values)

                          Means of Fitted values

           |  Percentage free meals in 3
           |          categories
   collcat |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 | 816.91431  589.34998  493.91891 | 596.34884
         2 | 825.65118  636.60468  508.83334 | 651.50002
         3 | 782.15094   655.6377  541.73334 |  692.1095
-----------+---------------------------------+----------
     Total | 805.71757  639.39395  504.37956 | 647.62251

Image statareg6_3

The graph of the cell means illustrates the interaction between collcat and mealcat. The graph shows the three levels of collcat as three different lines, and the three levels of mealcat as the three values on the x-axis of the graph. We can see that the effect of collcat differs based on the level of mealcat. For example, when mealcat is low, schools where collcat is 3 have the lowest api00 scores, as compared to schools that are medium or high on mealcat, where schools with collcat of 3 have the highest api00 scores.

Let’s investigate this interaction further by looking at the simple effects of collcat at each level of mealcat.

6.2. Simple effects

We found that the main effect of collcat was significant, but because we have an interaction the effect of collcat depends on the level of mealcat. We might want to ask whether the effect of collcat is significant at each level of mealcat.

6.2.1 Analyzing simple effects using xi3 and regress

In order to look at the simple effects of collcat at the different levels of mealcat, we will use the @ symbol instead of * to indicate that we want the interaction terms to reflect the simple effects of collcat at each level of mealcat. We will use helmert coding for collcat, which will be discussed further later.

xi3: regress api00 h.collcat@g.mealcat

h.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_3 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_1 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_2 |  -181.0414   9.077126   -19.94   0.000    -198.8874   -163.1953
 _Imealcat_3 |  -293.4103   9.449459   -31.05   0.000    -311.9884   -274.8322
   _Ico1Wme1 |   13.01323     13.528     0.96   0.337    -13.58349    39.60995
   _Ico1Wme2 |  -56.77117   16.67866    -3.40   0.001    -89.56223    -23.9801
   _Ico1Wme3 |  -31.36441   12.86955    -2.44   0.015    -56.66658   -6.062246
   _Ico2Wme1 |   43.50022   14.04092     3.10   0.002     15.89507    71.10536
   _Ico2Wme2 |  -19.03303   13.29175    -1.43   0.153    -45.16528     7.09922
   _Ico2Wme3 |      -32.9   20.23653    -1.63   0.105    -72.68603    6.886029
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

We can obtain the simple effect of collcat when mealcat is low (i.e., 1) via the test command below. This shows that the effect of collcat when mealcat is low is significant.

test _Ico1Wme1 _Ico2Wme1

 ( 1)  _Ico1Wme1 = 0.0
 ( 2)  _Ico2Wme1 = 0.0

       F(  2,   391) =    5.44
            Prob > F =    0.0047

We use the describe command below to see the meaning of these terms and see that these two terms represent the two comparisons on collcat when mealcat is 1. For example, in the term _Ico2Wme1, the 2 means that this is the second comparison on collcat and the 1 means that it is when mealcat is 1.

describe _Ico1Wme1 _Ico2Wme1

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
_Ico1Wme1       double %10.0g                 collcat(1 vs. 2+) @ mealcat==1
_Ico2Wme1       double %10.0g                 collcat(2 vs. 3) @ mealcat==1

We can test the simple effect of collcat when mealcat is 2 via the test command below. This shows that collcat is significant when mealcat is 2.

test  _Ico1Wme2 _Ico2Wme2

 ( 1)  _Ico1Wme2 = 0.0
 ( 2)  _Ico1Wme2 = 0.0

       F(  2,   391) =    7.33
            Prob > F =    0.0007

We can also test the simple effect of collcat when mealcat is 3 via the test command below. This shows that collcat is significant when mealcat is 3, if we use an alpha level of 0.05. We should note that since we are doing a number of additional tests, you might want to consider using post hoc corrections, such as a bonferoni correction to avoid Type I errors.

test  _Ico1Wme3 _Ico2Wme3

 ( 1)  _Ico1Wme3 = 0.0
 ( 2)  _Ico2Wme3 = 0.0

       F(  2,   391) =    3.20
            Prob > F =    0.0417

In summary, all three of the simple effects of collcat at each level of mealcat were significant. However, the effect of collcat when mealcat was 3 might not be significant if we used a post hoc criteria for evaluating its significance.

6.2.2 Coding of simple effects

While xi3 creates the coding for you, it is useful to see the coding it creates for making these simple effects. The coding for mealcat used simple coding, and it’s coding is just as we saw in chapter 5. Below we use the tablist command to show the coding for mealcat. You can download tablist from within Stata by typing search tablist (see How can I used the search command to search for programs and get additional help? for more information about using search).

We see that the coding of mealcat is just as we would expect from chapter 5.

tablist mealcat  _Imealcat_2 _Imealcat_3, sort(v)

    mealcat   _Imealca~2   _Imealca~3   Freq  
          1   -.33333333   -.33333333    131  
          2    .66666667   -.33333333    132  
          3   -.33333333    .66666667    137

We requested helmert coding for collcat, and we can look at the coding of collcat to see that the terms _Icollcat_1 _Icollcat_2 are indeed coded using helmert coding. We should note that these terms are not used in the analysis, but are used by xi3 for creating the simple effects shown in the next section.

tablist collcat  _Icollcat_1 _Icollcat_2, sort(v)

  collcat  _Icollca~1  _Icollca~2   Freq
        1   .66666667           0    129
        2  -.33333333          .5    134
        3  -.33333333         -.5    137

Now that we have seen the helmert coding for collcat, we can see how this is used to create the simple effects of collcat at each level of mealcat. First, we look at the two comparisons of collcat at mealcat of 1. Note that the coding is the same as we saw above, but only when mealcat is 1, otherwise these variables are coded 0.

tablist  mealcat collcat _Ico1Wme1 _Ico2Wme1, sort(v)

    mealcat   collcat    _Ico1Wme1   _Ico2W~1   Freq  
          1         1    .66666667          0     35  
          1         2   -.33333333         .5     43  
          1         3   -.33333333        -.5     53  
          2         1            0          0     20  
          2         2            0          0     43  
          2         3            0          0     69  
          3         1            0          0     74  
          3         2            0          0     48  
          3         3            0          0     15

Likewise, we look at the terms that form the effects of collcat when mealcat is 2, and we see that the variables are coded the same way when mealcat is 2, and otherwise 0.

tablist  mealcat collcat _Ico1Wme2 _Ico2Wme2, sort(v)

    mealcat   collcat    _Ico1Wme2   _Ico2W~2   Freq  
          1         1            0          0     35  
          1         2            0          0     43  
          1         3            0          0     53  
          2         1    .66666667          0     20  
          2         2   -.33333333         .5     43  
          2         3   -.33333333        -.5     69  
          3         1            0          0     74  
          3         2            0          0     48  
          3         3            0          0     15

Finally, we see the same pattern for the terms that form the effect of collcat when mealcat is 3.

tablist  mealcat collcat _Ico1Wme3 _Ico2Wme3, sort(v)

    mealcat   collcat    _Ico1Wme3   _Ico2W~3   Freq  
          1         1            0          0     35  
          1         2            0          0     43  
          1         3            0          0     53  
          2         1            0          0     20  
          2         2            0          0     43  
          2         3            0          0     69  
          3         1    .66666667          0     74  
          3         2   -.33333333         .5     48  
          3         3   -.33333333        -.5     15

This illustrates how xi3 codes the variables to allow the simple effects analysis. If you wished, you could manually create variables according to this strategy to perform a simple effects analysis.

3. Simple comparisons

In the analyses above we looked at the simple effect of collcat at each level of mealcat. For example, we looked at the overall effect of collcat when mealcat was 1. This is the simple effect of collcat at mealcat=1. Because collcat has more than two levels, we may wish to make further comparisons among the three levels of collcat within mealcat=1. Simple comparisons allow us to make such comparisons.

6.3.1 Analyzing Simple Comparisons Using xi3 and regress

In the analyses above we used helmert coding for collcat. We chose this coding so we could compare group 1 with groups 2 and 3 and then compare groups 2 and 3. For example, if we wanted to compare collcat 1 versus 2 and 3, we would want to look at the effect _Ico1Wme1, and if we wanted to compare collcat groups 2 and 3 when mealcat is 1, then we would look at the effect _Ico2Wme1. Because xi3 creates labels for each term that it creates, we can use the describe command to verify that we are using the correct terms. Indeed, we see that these terms are as we expected.

describe _Ico1Wme1 _Ico2Wme1

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
_Ico1Wme1       double %10.0g                 collcat(1 vs. 2+) @ mealcat==1
_Ico1Wme1       double %10.0g                 collcat(2 vs. 3) @ mealcat==1

We can use the regress command to see the effects for these terms.

regress

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_2 |  -181.0414   9.077126   -19.94   0.000    -198.8874   -163.1953
 _Imealcat_3 |  -293.4103   9.449459   -31.05   0.000    -311.9884   -274.8322
   _Ico1Wme1 |   13.01323     13.528     0.96   0.337    -13.58349    39.60995
   _Ico1Wme2 |  -56.77117   16.67866    -3.40   0.001    -89.56223    -23.9801
   _Ico1Wme3 |  -31.36441   12.86955    -2.44   0.015    -56.66658   -6.062246
   _Ico2Wme1 |   43.50022   14.04092     3.10   0.002     15.89507    71.10536
   _Ico2Wme2 |  -19.03303   13.29175    -1.43   0.153    -45.16528     7.09922
   _Ico2Wme3 |      -32.9   20.23653    -1.63   0.105    -72.68603    6.886029
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

We see that the collcat 1 is not significantly different from 2 and 3 at mealcat 1 (t=.96, p=.337), but collcat 2 is significantly different from collcat 3 at mealcat 1 (t=3.10, p=0.002).

6.3.2 Coding of Simple Comparisons

We can see that the coding of simple comparisons is the same as the coding of simple effects. For example, we can see that the coding of _Icollcat_1 and _Icollcat_2 is coded using helmert coding.

tablist collcat _Icollcat_1 _Icollcat_2, sort(v)

  collcat  _Icollca~1  _Icollca~2   Freq
        1   .66666667           0    129
        2  -.33333333          .5    134
        3  -.33333333         -.5    137

Then the term term _Ico1Wme1 represents the comparison of collcat 1 versus collcat 2 and 3 when mealcat is 1. Hence, the coding is the same as the coding for _Icollcat_1 when mealcat is 1, and 0 otherwise, see below.

tablist  mealcat collcat _Ico1Wme1, sort(v)

    mealcat   collcat    _Ico1Wme1   Freq  
          1         1    .66666667     35  
          1         2   -.33333333     43  
          1         3   -.33333333     53  
          2         1            0     20  
          2         2            0     43  
          2         3            0     69  
          3         1            0     74  
          3         2            0     48  
          3         3            0     15

6.4. Partial interaction

A partial interaction allows you to apply contrasts to one of the effects in an interaction term. For example, we can draw the interaction of collcat by mealcat like this below.

	Collcat low	Collcat Med	Collcat High
Mealcat Low
Mealcat Med
Mealcat High

Say that we wanted to compare, in the context of this interaction, group 1 for collcat versus groups 2 and 3. The table of this partial interaction would look like this. The contrast coefficients of -2 1 1 applied to collcat indicate the comparison of group 1 for collcat versus groups 2 and 3.

	-2	1	1
	Collcat low	Collcat Med	Collcat High
Mealcat Low
Mealcat Med
Mealcat High

Likewise, we also might want to compare groups 2 and 3 of collcat by mealcat, and the table of this interaction would look like this.

	0	-1	1
	Collcat low	Collcat Med	Collcat High
Mealcat Low
Mealcat Med
Mealcat High

These are called partial interactions because contrast coefficients are applied to one of the terms involved in the interaction.

6.4.1 Analyzing partial interactions using xi3 and regress

As shown above, we wish to compare groups 1 versus 2 and 3 on collcat, and then compare groups 2 and 3 on collcat. This implies helmert coding on collcat, as shown below. The coding for mealcat is chosen as forward difference coding (for the purposes of later analyses) but could have been any form of effect coding.

xi3: regress api00 h.collcat*a.mealcat

h.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_3 omitted)
a.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Icollcat_1 |  -25.04078   8.345388    -3.00   0.003    -41.44823   -8.633334
 _Icollcat_2 |  -2.810937   9.329377    -0.30   0.763    -21.15296    15.53108
 _Imealcat_1 |   181.0414   9.077126    19.94   0.000     163.1953    198.8874
 _Imealcat_2 |   112.3689   9.907594    11.34   0.000     92.89009    131.8477
   _Ico1Xme1 |    69.7844    21.4752     3.25   0.001     27.56308    112.0057
   _Ico1Xme2 |  -25.40675   21.06663    -1.21   0.229    -66.82479    16.01128
   _Ico2Xme1 |   62.53325   19.33438     3.23   0.001      24.5209    100.5456
   _Ico2Xme2 |   13.86697   24.21132     0.57   0.567    -33.73369    61.46763
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

Let’s look at all of the terms created by the xi3 command using the describe command.

describe _I*

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
_Icollcat_1     double %10.0g                 collcat(1 vs. 2+)
_Icollcat_2     double %10.0g                 collcat(2 vs. 3)
_Imealcat_1     double %10.0g                 mealcat(1 vs. 2)
_Imealcat_2     double %10.0g                 mealcat(2 vs. 3)
_Ico1Xme1       float  %9.0g                  collcat(1 vs. 2+)*mealcat(1 vs. 2)
_Ico1Xme2       float  %9.0g                  collcat(1 vs. 2+)*mealcat(2 vs. 3)
_Ico2Xme1       float  %9.0g                  collcat(2 vs. 3)*mealcat(1 vs. 2)
_Ico2Xme2       float  %9.0g                  collcat(2 vs. 3)*mealcat(2 vs. 3)

The partial interaction of collcat comparing groups 1 versus 2 and 3 by mealcat is composed of the interaction terms _Ico1Xme1 and _Ico1Xme2, because these are the terms from the interaction that compare groups 1 versus 2 and 3 on collcat. Below we use the test command to test this partial interaction. We find that this interaction is significant.

test _Ico1Xme1 _Ico1Xme2

 ( 1)  _Ico1Xme1 = 0.0
 ( 2)  _Ico1Xme2 = 0.0

       F(  2,   391) =    5.78
            Prob > F =    0.0033

Likewise to compare groups 2 and 3 on collcat by mealcat, we test the two terms of the interaction that involve the comparison of groups 2 and 3 on collcat. We find that this comparison is also significant.

test _Ico2Xme1 _Ico2Xme2

 ( 1)  _Ico1Xme1 = 0.0
 ( 2)  _Ico2Xme2 = 0.0

       F(  2,   391) =    7.11
            Prob > F =    0.0009

6.4.2 Coding of partial interactions

The terms _Ico1Xme1 and _Ico1Xme2 are just the product of their respective main effects. The coding for mealcat is really irrelevant, as long as some form of coding is used that sums to 0. Below you can see that _Ico1Xme1 is just _Icollcat_1 * _Imealcat_1.

tablist collcat mealcat _Icollcat_1 _Imealcat_1 _Ico1Xme1, sort(v)

  collcat             mealcat  _Icollca~1  _Imealca~1   _Ico1Xme1   Freq
        1                   1   .66666667   .66666667   .44444444     35
        1                   2   .66666667  -.33333333  -.22222222     20
        1                   3   .66666667  -.33333333  -.22222222     74
        2                   1  -.33333333   .66666667  -.22222222     43
        2                   2  -.33333333  -.33333333   .11111111     43
        2                   3  -.33333333  -.33333333   .11111111     48
        3                   1  -.33333333   .66666667  -.22222222     53
        3                   2  -.33333333  -.33333333   .11111111     69
        3                   3  -.33333333  -.33333333   .11111111     15

And you can see that _Ico1Xme2 is just _Icollcat_1 * _Imealcat_2.

tablist collcat mealcat _Icollcat_1 _Imealcat_2 _Ico1Xme2, s(v)

  collcat             mealcat  _Icollca~1  _Imealca~2  _IcolXme~2   Freq
        1                   1   .66666667   .33333333   .22222222     35
        1                   2   .66666667   .33333333   .22222222     20
        1                   3   .66666667  -.66666667  -.44444444     74
        2                   1  -.33333333   .33333333  -.11111111     43
        2                   2  -.33333333   .33333333  -.11111111     43
        2                   3  -.33333333  -.66666667   .22222222     48
        3                   1  -.33333333   .33333333  -.11111111     53
        3                   2  -.33333333   .33333333  -.11111111     69
        3                   3  -.33333333  -.66666667   .22222222     15

6.5. Interaction contrasts

Above we saw that a partial interaction allows you to apply contrast coefficients to one of the terms in a two-way interaction. An interaction contrast allows you to apply contrast coefficients to both of the terms in a two-way interaction.

For example, with respect to collcat say that we wish to compare groups 2 and 3, and with respect to mealcat we wish to compare groups 1 and 2. The table of this looks like this below.

		-1	1	0
		Collcat low	Collcat Med	Collcat High
0	Mealcat Low
-1	Mealcat Med
1	Mealcat High

We also would like to form a second interaction contrast that also compares groups 2 and 3 with respect to collcat, and compares groups 2 and 3 on mealcat. A table of this comparison is shown below.

		0	-1	1
		Collcat low	Collcat Med	Collcat High
0	Mealcat Low
-1	Mealcat Med
1	Mealcat High

If we look at the graph of the predicted values (repeated below) we constructed before, it compares the dashed and dotted lines (collcat 2 versus 3) by mealcat 1 versus 2, and then again by mealcat 2 versus 3.

Image statareg6_4

6.5.1 Analyzing interaction contrasts using xi3 and regress

Because we would like to compare groups 1 versus 2, and then groups 2 versus 3 on mealcat, this implies forward difference coding for mealcat (which will compare 1 versus 2, then 2 versus 3). For collcat we wish to compare groups 2 and 3, so we can use helmert coding for that comparison as we did above (since this will compare 1 versus 2 and 3, then 2 versus 3).

xi3: regress api00 h.collcat*a.mealcat

h.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_3 omitted)
a.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Icollcat_1 |  -25.04078   8.345388    -3.00   0.003    -41.44823   -8.633334
 _Icollcat_2 |  -2.810937   9.329377    -0.30   0.763    -21.15296    15.53108
 _Imealcat_1 |   181.0414   9.077126    19.94   0.000     163.1953    198.8874
 _Imealcat_2 |   112.3689   9.907594    11.34   0.000     92.89009    131.8477
   _Ico1Xme1 |    69.7844    21.4752     3.25   0.001     27.56308    112.0057
   _Ico1Xme2 |  -25.40675   21.06663    -1.21   0.229    -66.82479    16.01128
   _Ico2Xme1 |   62.53325   19.33438     3.23   0.001      24.5209    100.5456
   _Ico2Xme2 |   13.86697   24.21132     0.57   0.567    -33.73369    61.46763
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

If we are not sure what term we want to use, we can use the describe command to show the labels for the interaction terms.

describe _Ico1Xme* _Ico2Xme*

              storage  display   value
variable name   type   format    label   variable label
-------------------------------------------------------------------------------
_Ico1Xme1       double %10.0g            collcat(1 vs. 2+) & mealcat(1 vs. 2)
_Ico1Xme2       double %10.0g            collcat(1 vs. 2+) & mealcat(2 vs. 3)
_Ico2Xme1       double %10.0g            collcat(2 vs. 3) & mealcat(1 vs. 2)
_Ico2Xme2       double %10.0g            collcat(2 vs. 3) & mealcat(2 vs. 3)

The first interaction comparison of interest is tested by _Ico12Xme1 , and this term is significant. As we expect, the red and green lines are not parallel when we compare mealcat 1 and 2.

The second interaction comparison of interest is tested by _Ico2Xme2 , and this term is not significant. Looking at the graph, we can see that the red and green lines are mostly parallel between mealcat 2 and 3.

6.5.2 Coding of interaction contrasts

The term _Ico2Xme1 is just the product of the respective main effects, as shown below.

tablist collcat mealcat  _Icollcat_2 _Imealcat_1 _Ico1Xme1 , sort(v)

    collcat   mealcat   _Icoll~2   _Imealca~1   _Ico2Xme1   Freq  
          1         1          0    .66666667           0     35  
          1         2          0   -.33333333           0     20  
          1         3          0   -.33333333           0     74  
          2         1         .5    .66666667    .3333333     43  
          2         2         .5   -.33333333   -.1666667     43  
          2         3         .5   -.33333333   -.1666667     48  
          3         1        -.5    .66666667   -.3333333     53  
          3         2        -.5   -.33333333    .1666667     69  
          3         3        -.5   -.33333333    .1666667     15

6.6 Computing adjusted means

6.6.1 Computing adjusted means via anova

First, we show how you can compute adjusted means using the anova command. We use the same model that we have been using, including mealcat, collcat and the interaction of these two variables.

anova api00 collcat mealcat collcat*mealcat emer, contin(emer)

                           Number of obs =     400     R-squared     =  0.7930
                           Root MSE      = 65.4617     Adj R-squared =  0.7882

                  Source |  Partial SS    df       MS           F     Prob > F
         ----------------+----------------------------------------------------
                   Model |  6402428.26     9  711380.918     166.01     0.0000
                         |
                 collcat |  34730.0899     2  17365.0449       4.05     0.0181
                 mealcat |  3017331.85     2  1508665.92     352.06     0.0000
         collcat*mealcat |  96789.1156     4  24197.2789       5.65     0.0002
                    emer |  158713.455     1  158713.455      37.04     0.0000
                         |
                Residual |  1671243.73   390  4285.24034   
         ----------------+----------------------------------------------------
                   Total |  8073672.00   399  20234.7669

After performing the anova, we can then use the adjust command to get adjusted means broken down by collcat and mealcat. These adjusted means compute the mean that would be expected if every school in the sample were at the mean for the variable emer. Note that it is possible to compute adjusted means with emer at other values besides the mean, for example if we had put emer=50 it would have computed means adjusting each school as though it had a mean of 50.

adjust emer , by(collcat mealcat)

--------------------------------------------------------------------------
     Dependent variable: api00     Command: anova
  Covariate set to mean: emer = 12.6575
--------------------------------------------------------------------------

-------------------------------------
          |Percentage free meals in 3
          |        categories        
  collcat |       1        2        3
----------+--------------------------
        1 |  797.56  596.973  509.872
        2 |  812.55  636.405  523.885
        3 | 767.935  652.976  550.462
-------------------------------------
     Key:  Linear Prediction

6.6.2 Computing adjusted means via regress

Now we illustrate how to get the same adjusted means if you were to to the analysis via the regress command. First, we perform the regression analysis that is equivalent to the anova command above.

xi3: regress api00 g.collcat*g.mealcat emer

g.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_1 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_1 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  9,   390) =  166.01
       Model |  6402428.26     9  711380.918           Prob > F      =  0.0000
    Residual |  1671243.73   390  4285.24034           R-squared     =  0.7930
-------------+------------------------------           Adj R-squared =  0.7882
       Total |     8073672   399  20234.7669           Root MSE      =  65.462

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Icollcat_2 |   22.81146   8.713721     2.62   0.009     5.679711     39.9432
 _Icollcat_3 |   22.32251   9.588069     2.33   0.020     3.471742    41.17328
 _Imealcat_2 |  -163.8973   9.131088   -17.95   0.000    -181.8497    -145.945
 _Imealcat_3 |  -264.6091   10.20556   -25.93   0.000    -284.6739   -244.5443
   _Ico2Xme2 |   24.44231   23.26715     1.05   0.294    -21.30242    70.18704
   _Ico2Xme3 |  -.9774027    19.2525    -0.05   0.960    -38.82908    36.87428
   _Ico3Xme2 |   85.62852   22.04718     3.88   0.000     42.28233    128.9747
   _Ico3Xme3 |   70.21457   23.47354     2.99   0.003     24.06405    116.3651
        emer |   -2.00997   .3302709    -6.09   0.000    -2.659304   -1.360636
       _cons |   675.2877    5.55622   121.54   0.000     664.3638    686.2116
------------------------------------------------------------------------------

To create the adjusted means we wish to assume that all of the schools are at the average on the variable emer. We do this by assigning the average of emer to the variable emer, but first making a copy of emer as temer so we don’t destroy the contents of this variable.

rename emer temer
egen emer = mean(temer)

Now we create yhat as the predicted value. Since the value of emer is set to the mean of emer, this will be the predicted value assuming that all schools are at the average for emer.

predict yhat

Now, we can look at the average of yhat broken down by collcat and mealcat, which you can see corresponds to the adjusted means that we found with the adjust command following the anova command above.

table collcat mealcat, contents(yhat)

        Means of Fitted values

           |  Percentage free meals in 3
           |          categories
   collcat |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 | 797.56042  596.97284  509.87225 | 601.43115
         2 | 812.55023  636.40497  523.88464 | 652.62341
         3 | 767.93524  652.97614  550.46161 | 686.22515
-----------+---------------------------------+----------
     Total | 790.49498   639.0926  519.22579 |  647.6225

We then drop the variable emer and yhat since we no longer need these variables, and rename temer back to emer so the emer variable is back to the way it was before this process.

drop yhat emer
rename temer emer

6.63 Computing Adjusted means via postgr3

The postgr command can be used to simplify the process of computing adjusted means (i.e. predicted values when holding other variables constant). Let’s assume that you have run the same regression as shown above

. xi3: regress api00 g.collcat*g.mealcat emer 
<output omitted to save space>

You can then show the graph of adjusted means and table of adjusted means using postgr3 as shown below. Below we show just the able of adjusted means, and you can see that they correspond to those computed above. We should stress that it is important to use the xi3 command (rather than xi) before using postgr3 because then postgr3 knows which variables should be held constant (in this example emer) and which variables should not be held constant (in this example, _Imealcat_2 through _Ico3Xme3).

. postgr3 mealcat, by(collcat) connect(solid dash dot) table2

Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3
> _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3
Holding emer constant at 12.6575

----------------------------------------------------------------------
          |           Percentage free meals in 3 categories           
  collcat |   0-46% free meals   47-80% free meals  81-100% free meals
----------+-----------------------------------------------------------
        1 |           797.5604            596.9728            509.8723
        2 |           812.5502             636.405            523.8846
        3 |           767.9352            652.9761            550.4616
----------------------------------------------------------------------

6.7 More details on meaning of coefficients

So far we have discussed a variety of techniques that you can use to help interpret interactions of categorical variables in regression, but we have not gone into great detail about the meaning of the coefficients in these analyses. Let’s consider this further. Consider the analysis below using collcat and mealcat, using simple contrasts on both of these variables.

xi3: regress api00 g.collcat*g.mealcat

g.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_1 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_1 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Icollcat_2 |   23.63531   9.105331     2.60   0.010     5.733782    41.53685
 _Icollcat_3 |   26.44625   9.995129     2.65   0.008     6.795331    46.09717
 _Imealcat_2 |  -181.0414   9.077126   -19.94   0.000    -198.8874   -163.1953
 _Imealcat_3 |  -293.4103   9.449459   -31.05   0.000    -311.9884   -274.8322
   _Ico2Xme2 |   38.51777   24.19532     1.59   0.112    -9.051422    86.08697
   _Ico2Xme3 |   6.177537   20.08262     0.31   0.759     -33.3059    45.66097
   _Ico3Xme2 |    101.051   22.88808     4.42   0.000     56.05191    146.0501
   _Ico3Xme3 |   82.57776   24.43941     3.38   0.001     34.52867    130.6268
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

We can produce the adjusted means as shown below. These will be useful for interpreting the meaning of the coefficients.

predict yhat
table collcat mealcat, contents(mean yhat)

                          Means of Fitted values

           |  Percentage free meals in 3
           |          categories
   collcat |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 | 816.91431  589.34998  493.91891 | 596.34884
         2 | 825.65118  636.60468  508.83334 | 651.50002
         3 | 782.15094   655.6377  541.73334 |  692.1095
-----------+---------------------------------+----------
     Total | 805.71757  639.39395  504.37956 | 647.62251

We drop the variable yhat since we no longer need it in case we wish to use this variable name again.

drop yhat

Let’s consider the meaning of the coefficient for _Icollcat_2. The coding for this variable compares group 2 versus group 1; hence, this coefficient corresponds to mean(collcat2) – mean(collcat1). Note that these are the unweighted means, so we compute the mean for collcat2 as the mean of the three cells corresponding to collcat2, i.e., (825.651+636.605+508.833)/3 . If we compare the result below to the coefficient for _Icollcat_2 we see that they are the same.

display (825.651+636.605+508.833)/3 - (816.914+589.35+493.919)/3

23.635333

Likewise, the coefficient for _Icollcat_3 is mean(collcat3) – mean(collcat1), computed below. The value below corresponds to the coefficient for _Icollcat_3.

display (782.151+655.638+541.733)/3 - (816.914+589.35+493.919)/3

26.446333

Likewise, the coefficient for _Imealcat_2 works out to be mean(mealcat2) – mean(mealcat1), see below.

display (589.35+636.605+655.638)/3 - (816.914+825.651+782.151)/3

-181.041

And the coefficient for _Imealcat_3 is mean(mealcat3) – mean(mealcat1), see below.

display (493.919+508.833+541.733)/3 - (816.914+825.651+782.151)/3

-293.41033

To get the meaning of the coefficients for the interaction terms, we need to multiply the contrast coding of the main effects that created the interaction terms. For example, the term _Ico2Xme2 is the product of _Icollcat_2 and _Imealcat_2. We can form a 3 by 3 table showing the coding for _Icollcat_2 on the left, and _Imealcat_2 along the top, and then multiply these terms together and place the products in the cells of the table, see below

		-1	1	0
		Collcat low	Collcat Med	Collcat High
-1	Mealcat Low	1	-1	0
1	Mealcat Med	-1	1	0
0	Mealcat High	0	0	0

We then can multiply these terms in the cells by the means of the cells and we get the value for the coefficient for _Ico2Xme2. In other words, we see that this coefficient corresponds to the means of cells (1,2) and (2,1) minus cells (1,1) and (2,2).

display ( 816.914 - 589.35 -  825.651 +  636.605 )

38.518

We can go through the same process to verify the meaning of the coefficients for the other three interaction terms. We verify that _Ico2Xme3 is 6.177.

display ( 816.914 - 493.919 -  825.651 + 508.833)

6.177

We also verify that _Ico3Xme2 is 101.051.

display ( 816.914 - 589.35 -  782.151 +  655.638 )

101.051

And we verify that _Ico3Xme3 is 82.577.

display ( 816.914 - 493.919 -  782.151 + 541.733 )

82.577

6.8 Simple effects via dummy coding versus effect coding

You may wonder why we have gone to the effort of using xi3 for creating and testing these effects instead of just using dummy coding like we would get with the xi command. Let’s compare how to get simple effects using the xi3 command via effect coding to how we would get simple effects using xi with dummy coding. We hope to show that it is much easier to use effect coding via xi3 and that the interpretation of the coefficients is much more intuitive.

6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat

Let’s use an example from Chapter 3 (section 3.5). In that example we looked at an analysis using mealcat and yr_rnd and the interaction of these two variables. First, we look at how to do a simple effects analysis looking at the simple effects of yr_rnd at each level of mealcat using the xi3 command with effect coding. To make our results correspond to those from Chapter 3, we will make group 3 of mealcat the reference category.

char mealcat[omit] 3
xi3 : regress api00 g.yr_rnd@g.mealcat

g.yr_rnd          _Iyr_rnd_0-1        (naturally coded; _Iyr_rnd_0 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  5,   394) =  261.61
       Model |  6204727.82     5  1240945.56           Prob > F      =  0.0000
    Residual |  1868944.18   394  4743.51314           R-squared     =  0.7685
-------------+------------------------------           Adj R-squared =  0.7656
       Total |     8073672   399  20234.7669           Root MSE      =  68.873

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_1 |   267.8108   14.61559    18.32   0.000     239.0765    296.5451
 _Imealcat_2 |   114.6572   11.12812    10.30   0.000     92.77923    136.5351
   _Iyr1Wme1 |  -74.25691   26.75629    -2.78   0.006    -126.8599   -21.65397
   _Iyr1Wme2 |  -51.74017   18.88854    -2.74   0.006    -88.87511   -14.60523
   _Iyr1Wme3 |  -33.49254   11.77129    -2.85   0.005    -56.63492   -10.35015
       _cons |   632.2356   5.800477   109.00   0.000     620.8318    643.6393
------------------------------------------------------------------------------

Now we can obtain the simple effect of yr_rnd at mealcat=1 by inspecting the coefficient for _Iyr1Wme1, the simple effect of yr_rnd at mealcat=2 by inspecting the coefficient for _Iyr1Wme2 and the simple effect of yr_rnd at mealcat=3 by inspecting the coefficient for _Iyr1Wme3.

Now let’s perform the same analysis using xi with dummy coding. Again, we will explicitly make the third group for mealcat to be the omitted category.

char mealcat[omit] 3
xi : regress api00 i.mealcat*yr_rnd

i.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)
i.meal~t*yr_rnd   _ImeaXyr_rn_#       (coded as above)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  5,   394) =  261.61
       Model |  6204727.82     5  1240945.56           Prob > F      =  0.0000
    Residual |  1868944.18   394  4743.51314           R-squared     =  0.7685
-------------+------------------------------           Adj R-squared =  0.7656
       Total |  8073672.00   399  20234.7669           Root MSE      =  68.873

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_1 |   288.1929   10.44284    27.60   0.000     267.6623    308.7236
 _Imealcat_2 |    123.781   10.55185    11.73   0.000      103.036    144.5259
      yr_rnd |  -33.49254   11.77129    -2.85   0.005    -56.63492   -10.35015
_ImeaXyr_r~1 |  -40.76438   29.23118    -1.39   0.164    -98.23297    16.70422
_ImeaXyr_r~2 |  -18.24763   22.25624    -0.82   0.413    -62.00347     25.5082
       _cons |   521.4925   8.414197    61.98   0.000     504.9502    538.0349
------------------------------------------------------------------------------

In order to form a test of simple main effects we need to make a table like the one shown below that relates the means of the cells to the coefficients in the regression. Please see Chapter 3, section 3.5 for information on how this table was constructed.

            mealcat=1           mealcat=2         mealcat=3
            -------------------------------------------------
  yr_rnd=0  _cons               _cons             _cons    
            +BImealcat1         +BImealcat2 
            -------------------------------------------------
  yr_rnd=1  _cons               _cons             _cons    
            +Byr_rnd            +Byr_rnd          +Byr_rnd
            +BImealcat1         +BImealcat2           
            +B_ImeaXyr_rn_1     +B_ImeaXyr_rn_2

Let’s start by looking at how to get the simple effect of yr_rnd when mealcat is 3. Looking at the table above, we can see that we would want to compare _cons with _cons + Byr_rnd. We can do this with the lincom command as shown below.

lincom _cons - (_cons + yr_rnd)

 ( 1) - yr_rnd = 0.0

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   33.49254   11.77129     2.85   0.005     10.35015    56.63492
------------------------------------------------------------------------------

We see that _cons drops out, yielding just yr_rnd. Instead, we can use the test command to test whether the coefficient for yr_rnd is 0. Note that this result corresponds to the result we found with the xi3 command also testing the simple effect of yr_rnd when mealcat is 3.

test yr_rnd=0

 ( 1)  yr_rnd = 0.0

       F(  1,   394) =    8.10
            Prob > F =    0.0047

Note that the coefficient for yr_rnd corresponds to the test of the effect of yr_rnd when all other variables are set to 0 (the reference category), in other words, when mealcat is set to the reference category. You may be tempted to interpret the coefficient for yr_rnd as the overall difference between year round schools and non-year round schools, but in this example we see that it really corresponds to the simple effect of yr_rnd. When using dummy coding people commonly misinterpret the lower order effects to refer to overall effects rather than simple effects.

Now let’s look at the simple effect of yr_rnd when mealcat=1. Looking at the table above we see that this involves the comparison of the coefficients for yr_rnd=1 versus yr_rnd=0 when mealcat=1, i.e., comparing _cons + yr_rnd + _Imealcat_1 + _ImeaXyr_rn_1 versus _cons + _Imealcat_1. Removing the terms that drop out we can do the test command below.

test yr_rnd + _ImeaXyr_rn_1=0

 ( 1)  yr_rnd + _ImeaXyr_rn_1 = 0.0

       F(  1,   394) =    7.70
            Prob > F =    0.0058

We can likewise obtain the effect of yr_rnd when mealcat is 2, as shown below.

test yr_rnd + _ImeaXyr_rn_2=0

 ( 1)  yr_rnd + _ImeaXyr_rn_2 = 0.0

       F(  1,   394) =    7.50
            Prob > F =    0.0064

These examples illustrate that it is more complicated to form simple effects when using dummy coding, and also that the interpretation of lower order effects when using dummy coding may not have the meaning that you would expect.

6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd

Example 1 looked at simple effects for yr_rnd, a variable with only two levels In this example, let’s consider the simple effects of mealcat at each level of yr_rnd. Because mealcat has more than two levels, we can see what is required for doing tests of simple effects for variables with more than two levels.

First, let’s show how to get these simple effects using the xi3 command using effect coding.

xi3 : regress api00 g.mealcat@g.yr_rnd

g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)
g.yr_rnd          _Iyr_rnd_0-1        (naturally coded; _Iyr_rnd_0 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  5,   394) =  261.61
       Model |  6204727.82     5  1240945.56           Prob > F      =  0.0000
    Residual |  1868944.18   394  4743.51314           R-squared     =  0.7685
-------------+------------------------------           Adj R-squared =  0.7656
       Total |     8073672   399  20234.7669           Root MSE      =  68.873

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Iyr_rnd_1 |  -53.16321   11.60095    -4.58   0.000    -75.97072    -30.3557
   _Ime1Wyr0 |   288.1929   10.44284    27.60   0.000     267.6623    308.7236
   _Ime1Wyr1 |   247.4286   27.30218     9.06   0.000     193.7524    301.1048
   _Ime2Wyr0 |    123.781   10.55185    11.73   0.000      103.036    144.5259
   _Ime2Wyr1 |   105.5333   19.59588     5.39   0.000     67.00776    144.0589
       _cons |   632.2356   5.800477   109.00   0.000     620.8318    643.6393
------------------------------------------------------------------------------

We can get the simple effect of mealcat at yr_rnd = 0 just as we did earlier in this chapter.

test _Ime1Wyr0 _Ime2Wyr0

 ( 1)  _Ime1Wyr0 = 0
 ( 2)  _Ime2Wyr0 = 0

       F(  2,   394) =  411.46
            Prob > F =    0.0000

And we likewise get the simple effect of mealcat at yr_rnd = 1 as shown below.

test _Ime1Wyr1 _Ime2Wyr1

 ( 1)  _Ime1Wyr1 = 0
 ( 2)  _Ime2Wyr1 = 0

       F(  2,   394) =   50.19
            Prob > F =    0.0000

We can now test the simple effects of mealcat at each level of yr_rnd via dummy coding.

xi : regress api00 i.mealcat*yr_rnd

i.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_3 omitted)
i.meal~t*yr_rnd   _ImeaXyr_rn_#       (coded as above)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  5,   394) =  261.61
       Model |  6204727.82     5  1240945.56           Prob > F      =  0.0000
    Residual |  1868944.18   394  4743.51314           R-squared     =  0.7685
-------------+------------------------------           Adj R-squared =  0.7656
       Total |  8073672.00   399  20234.7669           Root MSE      =  68.873

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_1 |   288.1929   10.44284    27.60   0.000     267.6623    308.7236
 _Imealcat_2 |    123.781   10.55185    11.73   0.000      103.036    144.5259
      yr_rnd |  -33.49254   11.77129    -2.85   0.005    -56.63492   -10.35015
_ImeaXyr_r~1 |  -40.76438   29.23118    -1.39   0.164    -98.23297    16.70422
_ImeaXyr_r~2 |  -18.24763   22.25624    -0.82   0.413    -62.00347     25.5082
       _cons |   521.4925   8.414197    61.98   0.000     504.9502    538.0349
------------------------------------------------------------------------------

The simple effect of mealcat when yr_rnd is 0 requires two test statements since it is a 2 degree of freedom test. We can do this by testing mean(mealcat1) = mean(mealcat2) and also testing mean(mealcat2) = mean(mealcat3). We can look at the table above and see that mean(mealcat1) = mean(mealcat2) is _Imealcat_1– _Imealcat_2 (after _cons drops out) and mean(mealcat2) = mean(mealcat3) is _Imealcat_2 after _cons drops out. So, we can perform this test using the two test commands below.

test  _Imealcat_1- _Imealcat_2=0

 ( 1)  _Imealcat_1 - _Imealcat_2 = 0.0

       F(  1,   394) =  343.05
            Prob > F =    0.0000

test  _Imealcat_2, accum

 ( 1)  _Imealcat_1 - _Imealcat_2 = 0.0
 ( 2)  _Imealcat_2 = 0.0

       F(  2,   394) =  411.46
            Prob > F =    0.0000

Note that the effects _Imealcat_1 and _Imealcat_2 do not correspond to overall effects of the variable mealcat but are the simple effects when yr_rnd is set to 0, the reference level. Again we see that the terms that we might be tempted to call main effects and think of as overall effects really are simple effects when dummy coding is used.

The second test command uses the accum option to accumulate the tests to get the 2 degree of freedom test that corresponds to the simple effect of mealcat when yr_rnd is 0.

Likewise, we can look at the table above to form the comparisons needed to obtain the simple effects of mealcat when yr_rnd is 1.

test _Imealcat_1+ _ImeaXyr_rn_1- _Imealcat_2- _ImeaXyr_rn_2=0

 ( 1)  _Imealcat_1 - _Imealcat_2 + _ImeaXyr_rn_1 - _ImeaXyr_rn_2 = 0.0

       F(  1,   394) =   20.26
            Prob > F =    0.0000

test  _Imealcat_2+ _ImeaXyr_rn_2=0, accum

 ( 1)  _Imealcat_1 - _Imealcat_2 + _ImeaXyr_rn_1 - _ImeaXyr_rn_2 = 0.0
 ( 2)  _Imealcat_2 + _ImeaXyr_rn_2 = 0.0

       F(  2,   394) =   50.19
            Prob > F =    0.0000

Using this example we hoped to illustrate that when performing simple effects for a variable with more than two levels can be quite tricky and requires constructing multiple test commands, one test command for every degree of freedom in the simple effect. As you can see, constructing these terms can be very tricky and possibly error prone. Without a method for double checking results, it is very possible to make a mistake when constructing terms and form the wrong comparison. By comparison, using effect coding with xi3, forming comparisons can be much easier and the interpretation of the lower order effects is much more intuitive. The lower order effects do correspond to the overall effects of the variable, for example the effect of yr_rnd, when using effect coding, does correspond to the overall unweighted mean for the year round schools compared to the non-year round schools.