This is a draft version of this chapter. Comments and suggestions to improve this draft are welcome.

**Chapter outline 6.1. Analysis with two categorical variables
6.2. Simple effects
6.2.1 Analyzing simple effects using xi3 and regress
6.2.2 Coding of simple effects
6.3. Simple comparisons
6.3.1 Analyzing simple comparisons using xi3 and regress
6.3.2 Coding of simple comparisons
6.4. Partial interaction
6.4.1 Analyzing partial interactions using xi3 and regress
6.4.2 Coding of partial interactions
6.5. Interaction contrasts
6.5.1 Analyzing interaction contrasts using xi3 and regress
6.5.2 Coding of interaction contrasts
6.6. Computing adjusted means
6.6.1 Computing adjusted means via anova
6.6.1 Computing adjusted means via regress
6.7. More details on meaning of coefficients
6.8. Simple effects via dummy coding versus effect coding
6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat
6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd**

**Please note:** This page makes use of the programs **xi3** and **postgr3** which are no longer being maintained and has been removed from our
archives. References to **xi3** and **postgr3** will be left on this page because they illustrate specific principles of coding categorical
variables.

For this chapter we will use the **elemapi2** data file that we have been using in prior chapters. We will focus on the variables **mealcat**, and **collcat** as they relate to the outcome variable **api00**
(performance on the api in the year 2000). The variable **mealcat** is the variable **meals** broken up into
three categories, and the variable **collcat** is the variable **some_col** broken into 3 categories. We could think of **mealcat** as being the number of students receiving free meals and broken up into **low**, **middle** and **high**. The variable **collcat** can be thought of as the number of parents with some college education, and we could think of it as being broken up into **low**, **medium** and **high**. For our analysis, we think that both **mealcat** and **collcat** may be related to **api00**, but it is also possible that the impact of **mealcat** might depend on the level of **collcat**. In other words, we think that there might be an interaction of these two
categorical variables. In this chapter we will look at how these two categorical variables are related to api performance in the school, and we will look at the interaction of these two categorical variables as well. We will see that there is an interaction of these categorical variables, and will focus on different ways of further exploring the interaction.

We will first use the **elemapi2** data file.

use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear

We will modify the label for **mealcat** in order to more clearly see some of the points we will be demonstrating later
in this chapter.

label define mealcat 1 "1" 2 "2" 3 "3", modify

**6.1. Analysis with 2 categorical variables**

One traditional way to analyze this would be to perform a 3 by 3 factorial analysis of variance using the **anova** command, as shown below. The results show a main effect of **collcat** (F=4.5, p-0.0117), a main effect of **mealcat** (F=509.04, p=0.0000) and an interaction of **collcat** by **mealcat**, (F=6.63, p=0.0000).

anova api00 collcat mealcat collcat*mealcat

Number of obs = 400 R-squared = 0.7733 Root MSE = 68.412 Adj R-squared = 0.7687 Source | Partial SS df MS F Prob > F ----------------+---------------------------------------------------- Model | 6243714.81 8 780464.351 166.76 0.0000 | collcat | 42140.5662 2 21070.2831 4.50 0.0117 mealcat | 4764843.56 2 2382421.78 509.04 0.0000 collcat*mealcat | 124167.809 4 31041.9522 6.63 0.0000 | Residual | 1829957.19 391 4680.19741 ----------------+---------------------------------------------------- Total | 8073672.00 399 20234.7669

We can use the **adjust** command to show the adjusted means broken down by **collcat** and **mealcat**.

adjust, by(collcat mealcat)

---------------------------------------------------------- Dependent variable: api00 Command: anova ---------------------------------------------------------- ------------------------------------- |Percentage free meals in 3 | categories collcat | 1 2 3 ----------+-------------------------- 1 | 816.914 589.35 493.919 2 | 825.651 636.605 508.833 3 | 782.151 655.638 541.733 ------------------------------------- Key: Linear Prediction

We can show a graph of the adjusted means as shown below. We use the **separate** command to make
three variables corresponding to the three levels of **collcat** (i.e., **yhat1** corresponds to the predicted value when **collcat** is low). We can then show the graph with the three levels of **collcat** represented as
three separate lines.

predict yhat separate yhat, by(collcat)

storage display value variable name type format label variable label ------------------------------------------------------------------------------- yhat1 float %9.0g yhat, collcat == 1 yhat2 float %9.0g yhat, collcat == 2 yhat3 float %9.0g yhat, collcat == 3

graph twoway scatter yhat1 yhat2 yhat3 mealcat, connect(l l l) xlabel(1 2 3) sort

Now we drop the variables **yhat yhat1 yhat2 yhat3** in case we wish to use these variables later.

drop yhat yhat1 yhat2 yhat3

We can do these same analyses using the **regress** command. Below we use the **regress** command with **xi3** to look at the effect of **collcat**, **mealcat** and the interaction of these two variables.

xi3: regress api00 g.collcat*g.mealcat

. xi3: regress api00 g.collcat*g.mealcat g.collcat _Icollcat_1-3 (naturally coded; _Icollcat_1 omitted) g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_1 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icollcat_2 | 23.63531 9.105331 2.60 0.010 5.733782 41.53685 _Icollcat_3 | 26.44625 9.995129 2.65 0.008 6.795331 46.09717 _Imealcat_2 | -181.0414 9.077126 -19.94 0.000 -198.8874 -163.1953 _Imealcat_3 | -293.4103 9.449459 -31.05 0.000 -311.9884 -274.8322 _Ico2Xme2 | 38.51777 24.19532 1.59 0.112 -9.051422 86.08697 _Ico2Xme3 | 6.177537 20.08262 0.31 0.759 -33.3059 45.66097 _Ico3Xme2 | 101.051 22.88808 4.42 0.000 56.05191 146.0501 _Ico3Xme3 | 82.57776 24.43941 3.38 0.001 34.52867 130.6268 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

We use the **test** command to test the two terms associated with **collcat** to get the main effect of **collcat**.

test _Icollcat_2 _Icollcat_3

( 1) _Icollcat_2 = 0.0 ( 2) _Icollcat_3 = 0.0 F( 2, 391) = 4.50 Prob > F = 0.0117

Likewise we use the **test** command to get the overall test of **mealcat**.

test _Imealcat_2 _Imealcat_3

( 1) _Imealcat_2 = 0.0 ( 2) _Imealcat_3 = 0.0 F( 2, 391) = 509.04 Prob > F = 0.0000

Finally, we use the **test** command to test the interaction of of **collcat** by **mealcat**.

test _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3

( 1) _Ico2Xme2 = 0 ( 2) _Ico2Xme3 = 0 ( 3) _Ico3Xme2 = 0 ( 4) _Ico3Xme3 = 0 F( 4, 391) = 6.63 Prob > F = 0.0000

First, note that the results of the **test** commands correspond to those from the **anova** command above. This is because **collcat** and **mealcat** were coded using simple effect coding, a coding scheme where the contrasts sum to 0. We indicated that we wanted simple effect coding by using
**g.collcat** and **g.mealcat** on the **regress** command with **xi3** (see
Chapter 5 for more information about coding schemes
available via the **xi3** command). If this had been coded using **dummy** coding, e.g., **i.collcat**, then the results of the **test** commands for **mealcat** and **somecat** from the **regress** command would not have corresponded to the **anova** results. In addition to simple
effect coding, we could have used **e.**, **h.**, **r.**,
**a.**, **b.**, or **o.** and the results of the **test** commands would have matched the **anova** command, although the meaning of the
individual tests would have been different. This point will be explored in more detail later in this chapter.

We can obtain the adjusted means by using **predict** command to get the predicted values, calling them **pred** and then looking at the mean of **pred** broken down by **collcat** and **mealcat**.

predict pred table collcat mealcat, contents(mean pred)

Means, Standard Deviations and Frequencies of Fitted values | Percentage free meals in 3 | categories collcat | 1 2 3 | Total -----------+---------------------------------+---------- 1 | 816.91431 589.34998 493.91891 | 596.34884 2 | 825.65118 636.60468 508.83334 | 651.50002 3 | 782.15094 655.6377 541.73334 | 692.1095 -----------+---------------------------------+---------- Total | 805.71757 639.39395 504.37956 | 647.62251

We can show a graph of cell means as shown below. We use the same strategy as we did in making the graph above.

separate pred, by(collcat)

storage display value variable name type format label variable label ------------------------------------------------------------------------------- pred1 float %9.0g pred, collcat == 1 pred2 float %9.0g pred, collcat == 2 pred3 float %9.0g pred, collcat == 3

graph twoway scatter pred1 pred2 pred3 mealcat, c(l l l) xlabel(1 2 3) sort

Now we drop the variables **pred pred1 pred2 pred3** in case we wish to use these
variable names later.

drop pred pred1 pred2 pred3

Note that we could have produced the same graph and table of predicted values
using the **postgr3** command.

postgr3 mealcat, by(collcat) table2 clpattern(solid dash dot)

Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3 _IcolXmea_2_2 _IcolXmea_2_3 _IcolXmea_3_2 _IcolXmea_3_3 (option xb assumed; fitted values)

Means of Fitted values | Percentage free meals in 3 | categories collcat | 1 2 3 | Total -----------+---------------------------------+---------- 1 | 816.91431 589.34998 493.91891 | 596.34884 2 | 825.65118 636.60468 508.83334 | 651.50002 3 | 782.15094 655.6377 541.73334 | 692.1095 -----------+---------------------------------+---------- Total | 805.71757 639.39395 504.37956 | 647.62251

The graph of the cell means illustrates the interaction between **collcat** and **mealcat**. The graph shows the
three levels of **collcat** as three different lines, and the
three levels of **mealcat** as the three values on the x-axis of the graph. We can see that the effect of **collcat** differs based on the level of **mealcat**. For example, when **mealcat** is low, schools where **collcat** is
3 have the lowest **api00** scores, as compared to schools that are medium or high on **mealcat**, where schools with **collcat** of
3 have the highest **api00** scores.

Let’s investigate this interaction further by looking at the simple effects of **collcat** at each level of **mealcat**.

**6.2. Simple effects**

We found that the main effect of **collcat** was significant, but because we have an interaction the effect of **collcat** depends on the level of **mealcat**. We might want to ask whether the effect of **collcat** is significant at each level of **mealcat**.

**6.2.1 Analyzing simple effects using xi3 and regress**

In order to look at the simple effects of **collcat** at the different levels of **mealcat**, we will use the **@** symbol instead of ***** to indicate that we want the interaction terms to reflect the simple effects of **collcat** at each level of **mealcat**. We will use
helmert coding for **collcat**, which will be discussed further later.

xi3: regress api00 h.collcat@g.mealcat

h.collcat _Icollcat_1-3 (naturally coded; _Icollcat_3 omitted) g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_1 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imealcat_2 | -181.0414 9.077126 -19.94 0.000 -198.8874 -163.1953 _Imealcat_3 | -293.4103 9.449459 -31.05 0.000 -311.9884 -274.8322 _Ico1Wme1 | 13.01323 13.528 0.96 0.337 -13.58349 39.60995 _Ico1Wme2 | -56.77117 16.67866 -3.40 0.001 -89.56223 -23.9801 _Ico1Wme3 | -31.36441 12.86955 -2.44 0.015 -56.66658 -6.062246 _Ico2Wme1 | 43.50022 14.04092 3.10 0.002 15.89507 71.10536 _Ico2Wme2 | -19.03303 13.29175 -1.43 0.153 -45.16528 7.09922 _Ico2Wme3 | -32.9 20.23653 -1.63 0.105 -72.68603 6.886029 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

We can obtain the simple effect of **collcat** when ** mealcat** is low (i.e., 1) via the **test** command below. This shows that the effect of **collcat** when **mealcat** is low is significant.

test _Ico1Wme1 _Ico2Wme1

( 1) _Ico1Wme1 = 0.0 ( 2) _Ico2Wme1 = 0.0 F( 2, 391) = 5.44 Prob > F = 0.0047

We use the **describe** command below to see the meaning of these terms and see that these two terms represent the two comparisons on **collcat** when **mealcat** is 1. For example, in the term **_Ico2Wme1**, the
**2** means that this is the second comparison on **collcat** and the
**1** means that it is when **mealcat** is 1.

describe _Ico1Wme1 _Ico2Wme1

storage display value variable name type format label variable label ------------------------------------------------------------------------------- _Ico1Wme1 double %10.0g collcat(1 vs. 2+) @ mealcat==1 _Ico2Wme1 double %10.0g collcat(2 vs. 3) @ mealcat==1

We can test the simple effect of **collcat** when **mealcat** is 2 via the **test** command below. This shows that **collcat** is significant when **mealcat** is 2.

test _Ico1Wme2 _Ico2Wme2

( 1) _Ico1Wme2 = 0.0 ( 2) _Ico1Wme2 = 0.0 F( 2, 391) = 7.33 Prob > F = 0.0007

We can also test the simple effect of **collcat** when **mealcat** is 3 via the **test** command below. This shows that **collcat** is significant when **mealcat** is 3, if we use an alpha level of 0.05. We should note that since we are doing a number of additional tests, you might want to consider using post hoc corrections, such as a
bonferoni correction to avoid Type I errors.

test _Ico1Wme3 _Ico2Wme3

( 1) _Ico1Wme3 = 0.0 ( 2) _Ico2Wme3 = 0.0 F( 2, 391) = 3.20 Prob > F = 0.0417

In summary, all three of the simple effects of **collcat** at each level of **mealcat** were
significant. However, the effect of **collcat** when **mealcat** was 3 might not be significant if we used a post hoc criteria for evaluating its significance.

**6.2.2 Coding of simple effects **

While **xi3** creates the coding for you, it is useful to see the coding it creates for making these simple effects. The coding for **mealcat** used simple coding, and it’s coding is just as we saw in
chapter 5. Below we use the **tablist** command to show the coding for **mealcat**.
You can download **tablist** from within Stata by
typing **search tablist** (see
How can I used the search command to search for programs and get additional
help? for more information about using **search**).

We see that the coding of **mealcat** is just as we would expect from
chapter 5.

tablist mealcat _Imealcat_2 _Imealcat_3, sort(v)

mealcat _Imealca~2 _Imealca~3 Freq 1 -.33333333 -.33333333 131 2 .66666667 -.33333333 132 3 -.33333333 .66666667 137

We requested helmert coding for **collcat**, and we can look at the coding of **collcat** to see that the terms **_Icollcat_1 _Icollcat_2** are indeed coded using
helmert coding. We should note that these terms are not used in the analysis, but are used by **xi3** for creating the simple effects shown in the next section.

tablist collcat _Icollcat_1 _Icollcat_2, sort(v)

collcat _Icollca~1 _Icollca~2 Freq 1 .66666667 0 129 2 -.33333333 .5 134 3 -.33333333 -.5 137

Now that we have seen the helmert coding for **collcat**, we can see how this is used to create the simple effects of
** collcat** at each level of **mealcat**. First, we look at the two comparisons of **collcat** at **mealcat** of 1. Note that the coding is the same as we saw above, but only when **mealcat** is 1, otherwise these variables are coded 0.

tablist mealcat collcat _Ico1Wme1 _Ico2Wme1, sort(v)

mealcat collcat _Ico1Wme1 _Ico2W~1 Freq 1 1 .66666667 0 35 1 2 -.33333333 .5 43 1 3 -.33333333 -.5 53 2 1 0 0 20 2 2 0 0 43 2 3 0 0 69 3 1 0 0 74 3 2 0 0 48 3 3 0 0 15

Likewise, we look at the terms that form the effects of **collcat** when **mealcat** is 2, and we see that the variables are coded the same way when **mealcat** is 2, and otherwise 0.

tablist mealcat collcat _Ico1Wme2 _Ico2Wme2, sort(v)

mealcat collcat _Ico1Wme2 _Ico2W~2 Freq 1 1 0 0 35 1 2 0 0 43 1 3 0 0 53 2 1 .66666667 0 20 2 2 -.33333333 .5 43 2 3 -.33333333 -.5 69 3 1 0 0 74 3 2 0 0 48 3 3 0 0 15

Finally, we see the same pattern for the terms that form the effect of **collcat** when **mealcat** is 3.

tablist mealcat collcat _Ico1Wme3 _Ico2Wme3, sort(v)

mealcat collcat _Ico1Wme3 _Ico2W~3 Freq 1 1 0 0 35 1 2 0 0 43 1 3 0 0 53 2 1 0 0 20 2 2 0 0 43 2 3 0 0 69 3 1 .66666667 0 74 3 2 -.33333333 .5 48 3 3 -.33333333 -.5 15

This illustrates how **xi3** codes the variables to allow the simple effects analysis. If you wished, you could manually create variables according to this strategy to perform a simple effects analysis.

**3. Simple comparisons**

In the analyses above we looked at the simple effect of **collcat** at each level of **mealcat**. For example, we looked at the overall effect of **collcat** when **mealcat** was 1. This is the simple effect of **collcat** at **mealcat**=1. Because **collcat** has more than
two levels, we may wish to make further comparisons among the three levels of **collcat** within **mealcat**=1. Simple comparisons allow us to make such comparisons.

**6.3.1 Analyzing Simple Comparisons Using xi3 and regress**

In the analyses above we used helmert coding for **collcat**. We chose this coding so we could compare group 1 with groups 2 and 3 and then compare groups 2 and 3. For example, if we wanted to compare
** collcat** 1 versus 2 and 3, we would want to look at the effect **_Ico1Wme1**, and if we wanted to compare **collcat** groups 2 and 3 when **mealcat** is 1, then we would look at the effect **_Ico2Wme1**. Because **xi3** creates labels for each term that it creates, we can use the **describe** command to verify that we are using the correct terms. Indeed, we see that these terms are as we expected.

describe _Ico1Wme1 _Ico2Wme1

storage display value variable name type format label variable label ------------------------------------------------------------------------------- _Ico1Wme1 double %10.0g collcat(1 vs. 2+) @ mealcat==1 _Ico1Wme1 double %10.0g collcat(2 vs. 3) @ mealcat==1

We can use the **regress** command to see the effects for these terms.

regress

Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imealcat_2 | -181.0414 9.077126 -19.94 0.000 -198.8874 -163.1953 _Imealcat_3 | -293.4103 9.449459 -31.05 0.000 -311.9884 -274.8322 _Ico1Wme1 | 13.01323 13.528 0.96 0.337 -13.58349 39.60995 _Ico1Wme2 | -56.77117 16.67866 -3.40 0.001 -89.56223 -23.9801 _Ico1Wme3 | -31.36441 12.86955 -2.44 0.015 -56.66658 -6.062246 _Ico2Wme1 | 43.50022 14.04092 3.10 0.002 15.89507 71.10536 _Ico2Wme2 | -19.03303 13.29175 -1.43 0.153 -45.16528 7.09922 _Ico2Wme3 | -32.9 20.23653 -1.63 0.105 -72.68603 6.886029 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

We see that the ** collcat** 1 is not significantly different from 2 and
3 at
** mealcat** 1 (t=.96, p=.337), but ** collcat** 2 is significantly different from
** collcat** 3 at
** mealcat** 1 (t=3.10, p=0.002).

**6.3.2 Coding of Simple Comparisons **

We can see that the coding of simple comparisons is the same as the coding of simple effects. For example, we can see that the coding of **_Icollcat_1** and **_Icollcat_2** is coded using
helmert coding.

tablist collcat _Icollcat_1 _Icollcat_2, sort(v)

collcat _Icollca~1 _Icollca~2 Freq 1 .66666667 0 129 2 -.33333333 .5 134 3 -.33333333 -.5 137

Then the term term **_Ico1Wme1** represents the comparison of
** collcat** 1 versus **collcat** 2 and 3 when ** mealcat** is 1. Hence, the coding is the same as the coding for **_Icollcat_1** when **mealcat** is 1, and 0 otherwise, see below.

tablist mealcat collcat _Ico1Wme1, sort(v)

mealcat collcat _Ico1Wme1 Freq 1 1 .66666667 35 1 2 -.33333333 43 1 3 -.33333333 53 2 1 0 20 2 2 0 43 2 3 0 69 3 1 0 74 3 2 0 48 3 3 0 15

**6.4. Partial interaction **

A partial interaction allows you to apply contrasts to one of the effects in an interaction term. For example, we can draw the interaction of **collcat** by **mealcat** like this below.

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

Say that we wanted to compare, in the context of this interaction, group 1 for **collcat**
versus groups 2 and 3. The table of this partial interaction would look like this.
The contrast coefficients of -2 1 1 applied to **collcat** indicate the
comparison of group 1 for **collcat** versus groups 2 and 3.

-2 | 1 | 1 | |

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

Likewise, we also might want to compare groups 2 and 3 of **collcat** by **mealcat**, and the table of this interaction would look like this.

0 | -1 | 1 | |

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

These are called partial interactions because contrast coefficients are applied to one of the terms involved in the interaction.

**6.4.1 Analyzing partial interactions using xi3 and regress**

As shown above, we wish to compare groups 1 versus 2 and 3 on **collcat**, and then compare groups 2 and 3 on **collcat**. This implies
helmert coding on **collcat**, as shown below. The coding for **mealcat** is chosen as
forward difference coding (for the purposes of later analyses) but could have been any form of effect coding.

xi3: regress api00 h.collcat*a.mealcat

h.collcat _Icollcat_1-3 (naturally coded; _Icollcat_3 omitted) a.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icollcat_1 | -25.04078 8.345388 -3.00 0.003 -41.44823 -8.633334 _Icollcat_2 | -2.810937 9.329377 -0.30 0.763 -21.15296 15.53108 _Imealcat_1 | 181.0414 9.077126 19.94 0.000 163.1953 198.8874 _Imealcat_2 | 112.3689 9.907594 11.34 0.000 92.89009 131.8477 _Ico1Xme1 | 69.7844 21.4752 3.25 0.001 27.56308 112.0057 _Ico1Xme2 | -25.40675 21.06663 -1.21 0.229 -66.82479 16.01128 _Ico2Xme1 | 62.53325 19.33438 3.23 0.001 24.5209 100.5456 _Ico2Xme2 | 13.86697 24.21132 0.57 0.567 -33.73369 61.46763 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

Let’s look at all of the terms created by the **xi3 **command using the **describe** command.

describe _I*

storage display value variable name type format label variable label ------------------------------------------------------------------------------- _Icollcat_1 double %10.0g collcat(1 vs. 2+) _Icollcat_2 double %10.0g collcat(2 vs. 3) _Imealcat_1 double %10.0g mealcat(1 vs. 2) _Imealcat_2 double %10.0g mealcat(2 vs. 3) _Ico1Xme1 float %9.0g collcat(1 vs. 2+)*mealcat(1 vs. 2) _Ico1Xme2 float %9.0g collcat(1 vs. 2+)*mealcat(2 vs. 3) _Ico2Xme1 float %9.0g collcat(2 vs. 3)*mealcat(1 vs. 2) _Ico2Xme2 float %9.0g collcat(2 vs. 3)*mealcat(2 vs. 3)

The partial interaction of **collcat** comparing groups 1 versus
2 and 3 by ** mealcat** is composed of the interaction terms **_Ico1Xme1** and **_Ico1Xme2**, because these are the terms from the interaction that compare groups 1
versus 2 and 3 on **collcat**. Below we use the **test** command to test this partial interaction. We find that this interaction is significant.

test _Ico1Xme1 _Ico1Xme2

( 1) _Ico1Xme1 = 0.0 ( 2) _Ico1Xme2 = 0.0 F( 2, 391) = 5.78 Prob > F = 0.0033

Likewise to compare groups 2 and 3 on **collcat** by **mealcat**, we test the two terms of the interaction that involve the comparison of groups 2 and 3 on **collcat**. We find that this comparison is also significant.

test _Ico2Xme1 _Ico2Xme2

( 1) _Ico1Xme1 = 0.0 ( 2) _Ico2Xme2 = 0.0 F( 2, 391) = 7.11 Prob > F = 0.0009

**6.4.2 Coding of partial interactions **

The terms ** _Ico1Xme1** and ** _Ico1Xme2** are just the product of their respective main effects. The coding for **mealcat** is really
irrelevant, as long as some form of coding is used that sums to 0. Below you can see that **_Ico1Xme1** is just **_Icollcat_1** * **_Imealcat_1**.

tablist collcat mealcat _Icollcat_1 _Imealcat_1 _Ico1Xme1, sort(v)

collcat mealcat _Icollca~1 _Imealca~1 _Ico1Xme1 Freq 1 1 .66666667 .66666667 .44444444 35 1 2 .66666667 -.33333333 -.22222222 20 1 3 .66666667 -.33333333 -.22222222 74 2 1 -.33333333 .66666667 -.22222222 43 2 2 -.33333333 -.33333333 .11111111 43 2 3 -.33333333 -.33333333 .11111111 48 3 1 -.33333333 .66666667 -.22222222 53 3 2 -.33333333 -.33333333 .11111111 69 3 3 -.33333333 -.33333333 .11111111 15

And you can see that **_Ico1Xme2** is just **_Icollcat_1** * **_Imealcat_2**.

tablist collcat mealcat _Icollcat_1 _Imealcat_2 _Ico1Xme2, s(v)

collcat mealcat _Icollca~1 _Imealca~2 _IcolXme~2 Freq 1 1 .66666667 .33333333 .22222222 35 1 2 .66666667 .33333333 .22222222 20 1 3 .66666667 -.66666667 -.44444444 74 2 1 -.33333333 .33333333 -.11111111 43 2 2 -.33333333 .33333333 -.11111111 43 2 3 -.33333333 -.66666667 .22222222 48 3 1 -.33333333 .33333333 -.11111111 53 3 2 -.33333333 .33333333 -.11111111 69 3 3 -.33333333 -.66666667 .22222222 15

**6.5. Interaction contrasts**

Above we saw that a partial interaction allows you to apply contrast coefficients to one of the terms in a two-way interaction. An interaction contrast allows you to apply contrast coefficients to both of the terms in a two-way interaction.

For example, with respect to **collcat** say that we wish to compare groups 2 and 3, and with respect to **mealcat** we wish to compare groups 1 and 2. The table of this looks like this below.

-1 | 1 | 0 | ||

Collcat low | Collcat Med | Collcat High | ||

0 | Mealcat Low | |||

-1 | Mealcat Med | |||

1 | Mealcat High |

We also would like to form a second interaction contrast that also compares groups 2 and 3 with respect to **collcat**, and compares groups 2 and 3 on **mealcat**. A table of this comparison is shown below.

0 | -1 | 1 | ||

Collcat low | Collcat Med | Collcat High | ||

0 | Mealcat Low | |||

-1 | Mealcat Med | |||

1 | Mealcat High |

If we look at the graph of the predicted values (repeated below) we constructed
before, it compares the
dashed and dotted lines (**collcat** 2 versus 3) by ** mealcat** 1 versus 2, and then again by
** mealcat** 2 versus 3.

**6.5.1 Analyzing interaction contrasts using xi3 and regress**

Because we would like to compare groups 1 versus 2, and then groups 2 versus 3 on **mealcat**, this implies forward difference coding for **mealcat** (which will compare 1
versus 2, then 2 versus 3). For **collcat** we wish to compare groups 2 and 3, so we can use
helmert coding for that comparison as we did above (since this will compare 1
versus 2 and 3, then 2 versus 3).

xi3: regress api00 h.collcat*a.mealcat

h.collcat _Icollcat_1-3 (naturally coded; _Icollcat_3 omitted) a.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icollcat_1 | -25.04078 8.345388 -3.00 0.003 -41.44823 -8.633334 _Icollcat_2 | -2.810937 9.329377 -0.30 0.763 -21.15296 15.53108 _Imealcat_1 | 181.0414 9.077126 19.94 0.000 163.1953 198.8874 _Imealcat_2 | 112.3689 9.907594 11.34 0.000 92.89009 131.8477 _Ico1Xme1 | 69.7844 21.4752 3.25 0.001 27.56308 112.0057 _Ico1Xme2 | -25.40675 21.06663 -1.21 0.229 -66.82479 16.01128 _Ico2Xme1 | 62.53325 19.33438 3.23 0.001 24.5209 100.5456 _Ico2Xme2 | 13.86697 24.21132 0.57 0.567 -33.73369 61.46763 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

If we are not sure what term we want to use, we can use the **describe** command to show
the labels for the interaction terms.

describe _Ico1Xme* _Ico2Xme*

storage display value variable name type format label variable label ------------------------------------------------------------------------------- _Ico1Xme1 double %10.0g collcat(1 vs. 2+) & mealcat(1 vs. 2) _Ico1Xme2 double %10.0g collcat(1 vs. 2+) & mealcat(2 vs. 3) _Ico2Xme1 double %10.0g collcat(2 vs. 3) & mealcat(1 vs. 2) _Ico2Xme2 double %10.0g collcat(2 vs. 3) & mealcat(2 vs. 3)

The first interaction comparison of interest is tested by ** _Ico12Xme1 **, and this term is significant.
As we expect, the red and green lines are not
parallel when we compare **mealcat** 1 and 2.

The second interaction comparison of interest is tested by ** _Ico2Xme2 **, and this term is not significant. Looking at the graph, we can see that the red and green lines are mostly
parallel between ** mealcat** 2 and 3.

**6.5.2 Coding of interaction contrasts**

The term **_Ico2Xme1** is just the product of the respective main effects, as shown below.

tablist collcat mealcat _Icollcat_2 _Imealcat_1 _Ico1Xme1 , sort(v)

collcat mealcat _Icoll~2 _Imealca~1 _Ico2Xme1 Freq 1 1 0 .66666667 0 35 1 2 0 -.33333333 0 20 1 3 0 -.33333333 0 74 2 1 .5 .66666667 .3333333 43 2 2 .5 -.33333333 -.1666667 43 2 3 .5 -.33333333 -.1666667 48 3 1 -.5 .66666667 -.3333333 53 3 2 -.5 -.33333333 .1666667 69 3 3 -.5 -.33333333 .1666667 15

**6.6 Computing adjusted means**

**6.6.1 Computing adjusted means via anova**

First, we show how you can compute adjusted means using the ** anova** command. We use the same model that we have been using, including **mealcat**, **collcat** and the interaction of these two variables.

anova api00 collcat mealcat collcat*mealcat emer, contin(emer)

Number of obs = 400 R-squared = 0.7930 Root MSE = 65.4617 Adj R-squared = 0.7882 Source | Partial SS df MS F Prob > F ----------------+---------------------------------------------------- Model | 6402428.26 9 711380.918 166.01 0.0000 | collcat | 34730.0899 2 17365.0449 4.05 0.0181 mealcat | 3017331.85 2 1508665.92 352.06 0.0000 collcat*mealcat | 96789.1156 4 24197.2789 5.65 0.0002 emer | 158713.455 1 158713.455 37.04 0.0000 | Residual | 1671243.73 390 4285.24034 ----------------+---------------------------------------------------- Total | 8073672.00 399 20234.7669

After performing the anova, we can then use the **adjust** command to get adjusted means broken down by **collcat** and **mealcat**. These adjusted means compute the mean that would be expected if every school in the sample were at the mean for the variable **emer**. Note that it is possible to compute adjusted means with **emer** at other values besides the mean, for example if we had put **emer=50** it would have computed means adjusting each school as though it had a mean of 50.

adjust emer , by(collcat mealcat)

-------------------------------------------------------------------------- Dependent variable: api00 Command: anova Covariate set to mean: emer = 12.6575 -------------------------------------------------------------------------- ------------------------------------- |Percentage free meals in 3 | categories collcat | 1 2 3 ----------+-------------------------- 1 | 797.56 596.973 509.872 2 | 812.55 636.405 523.885 3 | 767.935 652.976 550.462 ------------------------------------- Key: Linear Prediction

**6.6.2 Computing adjusted means via regress**

Now we illustrate how to get the same adjusted means if you were to to the analysis via the **regress** command. First, we perform the regression analysis that is equivalent to the **anova** command above.

xi3: regress api00 g.collcat*g.mealcat emer

g.collcat _Icollcat_1-3 (naturally coded; _Icollcat_1 omitted) g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_1 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 9, 390) = 166.01 Model | 6402428.26 9 711380.918 Prob > F = 0.0000 Residual | 1671243.73 390 4285.24034 R-squared = 0.7930 -------------+------------------------------ Adj R-squared = 0.7882 Total | 8073672 399 20234.7669 Root MSE = 65.462 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icollcat_2 | 22.81146 8.713721 2.62 0.009 5.679711 39.9432 _Icollcat_3 | 22.32251 9.588069 2.33 0.020 3.471742 41.17328 _Imealcat_2 | -163.8973 9.131088 -17.95 0.000 -181.8497 -145.945 _Imealcat_3 | -264.6091 10.20556 -25.93 0.000 -284.6739 -244.5443 _Ico2Xme2 | 24.44231 23.26715 1.05 0.294 -21.30242 70.18704 _Ico2Xme3 | -.9774027 19.2525 -0.05 0.960 -38.82908 36.87428 _Ico3Xme2 | 85.62852 22.04718 3.88 0.000 42.28233 128.9747 _Ico3Xme3 | 70.21457 23.47354 2.99 0.003 24.06405 116.3651 emer | -2.00997 .3302709 -6.09 0.000 -2.659304 -1.360636 _cons | 675.2877 5.55622 121.54 0.000 664.3638 686.2116 ------------------------------------------------------------------------------

To create the adjusted means we wish to assume that all of the schools are at the average on the variable **emer**. We do this by assigning the average of **emer** to the variable **emer**, but first making a copy of **emer** as **temer** so we don’t destroy the contents of this variable.

rename emer temer egen emer = mean(temer)

Now we create **yhat** as the predicted value. Since the value of **emer** is set to the mean of **emer**, this will be the predicted value assuming that all schools are at the average for **emer**.

predict yhat

Now, we can look at the average of **yhat** broken down by **collcat** and **mealcat**, which you can see corresponds to the adjusted means that we found with the **adjust** command following the **anova** command above.

table collcat mealcat, contents(yhat)

Means of Fitted values | Percentage free meals in 3 | categories collcat | 1 2 3 | Total -----------+---------------------------------+---------- 1 | 797.56042 596.97284 509.87225 | 601.43115 2 | 812.55023 636.40497 523.88464 | 652.62341 3 | 767.93524 652.97614 550.46161 | 686.22515 -----------+---------------------------------+---------- Total | 790.49498 639.0926 519.22579 | 647.6225

We then drop the variable **emer** and **yhat** since we no longer need these variables, and rename **temer** back to **emer** so the **emer** variable is back to the way it was before this process.

drop yhat emer rename temer emer

**6.63 Computing Adjusted means via
postgr3**

The **postgr** command can be used to simplify the
process of computing adjusted means (i.e. predicted values when holding other
variables constant). Let’s assume that you have run the same regression as
shown above

. xi3: regress api00 g.collcat*g.mealcat emer<output omitted to save space>

You can then show the graph of
adjusted means and table of adjusted means using **postgr3** as shown below.
Below we show just the able of adjusted means, and you can see that they
correspond to those computed above. We should stress that it is important
to use the **xi3** command (rather than **xi**) before using **postgr3**
because then **postgr3** knows which variables should be held constant (in
this example **emer**) and which variables should not be held constant (in
this example, **_Imealcat_2** through ** _Ico3Xme3**).

. postgr3 mealcat, by(collcat) connect(solid dash dot) table2

Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3 > _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3 Holding emer constant at 12.6575 ---------------------------------------------------------------------- | Percentage free meals in 3 categories collcat | 0-46% free meals 47-80% free meals 81-100% free meals ----------+----------------------------------------------------------- 1 | 797.5604 596.9728 509.8723 2 | 812.5502 636.405 523.8846 3 | 767.9352 652.9761 550.4616 ----------------------------------------------------------------------

**6.7 More details on meaning of coefficients**

So far we have discussed a variety of techniques that you can use to help interpret interactions of categorical variables in regression, but we have not gone into great detail about the meaning of the coefficients in these analyses. Let’s consider this further. Consider the analysis below using **collcat** and **mealcat**, using simple contrasts on both of these variables.

xi3: regress api00 g.collcat*g.mealcat

g.collcat _Icollcat_1-3 (naturally coded; _Icollcat_1 omitted) g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_1 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 8, 391) = 166.76 Model | 6243714.81 8 780464.351 Prob > F = 0.0000 Residual | 1829957.19 391 4680.19741 R-squared = 0.7733 -------------+------------------------------ Adj R-squared = 0.7687 Total | 8073672 399 20234.7669 Root MSE = 68.412 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Icollcat_2 | 23.63531 9.105331 2.60 0.010 5.733782 41.53685 _Icollcat_3 | 26.44625 9.995129 2.65 0.008 6.795331 46.09717 _Imealcat_2 | -181.0414 9.077126 -19.94 0.000 -198.8874 -163.1953 _Imealcat_3 | -293.4103 9.449459 -31.05 0.000 -311.9884 -274.8322 _Ico2Xme2 | 38.51777 24.19532 1.59 0.112 -9.051422 86.08697 _Ico2Xme3 | 6.177537 20.08262 0.31 0.759 -33.3059 45.66097 _Ico3Xme2 | 101.051 22.88808 4.42 0.000 56.05191 146.0501 _Ico3Xme3 | 82.57776 24.43941 3.38 0.001 34.52867 130.6268 _cons | 650.0883 3.871885 167.90 0.000 642.4759 657.7006 ------------------------------------------------------------------------------

We can produce the adjusted means as shown below. These will be useful for interpreting the meaning of the coefficients.

predict yhat table collcat mealcat, contents(mean yhat)

Means of Fitted values | Percentage free meals in 3 | categories collcat | 1 2 3 | Total -----------+---------------------------------+---------- 1 | 816.91431 589.34998 493.91891 | 596.34884 2 | 825.65118 636.60468 508.83334 | 651.50002 3 | 782.15094 655.6377 541.73334 | 692.1095 -----------+---------------------------------+---------- Total | 805.71757 639.39395 504.37956 | 647.62251

We drop the variable **yhat** since we no longer need it in case we wish to use this variable name again.

drop yhat

Let’s consider the meaning of the coefficient for **_Icollcat_2**. The coding for this variable compares group 2
versus group 1; hence, this coefficient corresponds to mean(collcat2) – mean(collcat1). Note that these are the unweighted means, so we compute the mean for **collcat2** as the mean of the
three cells corresponding to **collcat2**, i.e., (825.651+636.605+508.833)/3 . If we compare the result below to the coefficient for **_Icollcat_2** we see that they are the same.

display (825.651+636.605+508.833)/3 - (816.914+589.35+493.919)/3

23.635333

Likewise, the coefficient for **_Icollcat_3** is mean(collcat3) – mean(collcat1), computed below. The value below corresponds to the coefficient for **_Icollcat_3**.

display (782.151+655.638+541.733)/3 - (816.914+589.35+493.919)/3

26.446333

Likewise, the coefficient for **_Imealcat_2** works out to be mean(mealcat2) – mean(mealcat1), see below.

display (589.35+636.605+655.638)/3 - (816.914+825.651+782.151)/3

-181.041

And the coefficient for **_Imealcat_3** is mean(mealcat3) – mean(mealcat1), see below.

display (493.919+508.833+541.733)/3 - (816.914+825.651+782.151)/3

-293.41033

To get the meaning of the coefficients for the interaction terms, we need to multiply the contrast coding of the main effects that created the interaction terms. For example, the term **_Ico2Xme2** is the product of **_Icollcat_2** and **_Imealcat_2**. We can form a 3 by 3 table showing the coding for **_Icollcat_2** on the left, and **_Imealcat_2** along the top, and then multiply these terms together and place the products in the cells of the table, see below

-1 | 1 | 0 | ||

Collcat low | Collcat Med | Collcat High | ||

-1 | Mealcat Low | 1 | -1 | 0 |

1 | Mealcat Med | -1 | 1 | 0 |

0 | Mealcat High | 0 | 0 | 0 |

We then can multiply these terms in the cells by the means of the cells and we get the value for the coefficient for **_Ico2Xme2**. In other words, we see that this coefficient corresponds to the means of cells (1,2) and (2,1) minus cells (1,1) and (2,2).

display ( 816.914 - 589.35 - 825.651 + 636.605 )

38.518

We can go through the same process to verify the meaning of the coefficients for the other
three interaction terms. We verify that **_Ico2Xme3** is 6.177.

display ( 816.914 - 493.919 - 825.651 + 508.833)

6.177

We also verify that **_Ico3Xme2** is 101.051.

display ( 816.914 - 589.35 - 782.151 + 655.638 )

101.051

And we verify that **_Ico3Xme3** is 82.577.

display ( 816.914 - 493.919 - 782.151 + 541.733 )

82.577

**6.8 Simple effects via dummy coding versus effect coding**

You may wonder why we have gone to the effort of using **xi3** for creating and testing these effects instead of just using dummy coding like we would get with the **xi** command. Let’s compare how to get
simple effects using the **xi3** command via effect coding to how we would get simple effects using **xi** with dummy coding. We hope to show that it is much easier to use effect coding via **xi3** and that the interpretation of the coefficients is much more intuitive.

**6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat**

Let’s use an example from Chapter 3
(section 3.5). In that example we looked at an analysis using **mealcat** and **yr_rnd** and the interaction of these two variables. First, we look at how to do a simple effects analysis looking at the simple effects of **yr_rnd** at each level of **mealcat** using the **xi3** command with effect coding. To make our results correspond to those from Chapter 3, we will make
group 3 of **mealcat** the reference category.

char mealcat[omit] 3 xi3 : regress api00 g.yr_rnd@g.mealcat

g.yr_rnd _Iyr_rnd_0-1 (naturally coded; _Iyr_rnd_0 omitted) g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 5, 394) = 261.61 Model | 6204727.82 5 1240945.56 Prob > F = 0.0000 Residual | 1868944.18 394 4743.51314 R-squared = 0.7685 -------------+------------------------------ Adj R-squared = 0.7656 Total | 8073672 399 20234.7669 Root MSE = 68.873 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imealcat_1 | 267.8108 14.61559 18.32 0.000 239.0765 296.5451 _Imealcat_2 | 114.6572 11.12812 10.30 0.000 92.77923 136.5351 _Iyr1Wme1 | -74.25691 26.75629 -2.78 0.006 -126.8599 -21.65397 _Iyr1Wme2 | -51.74017 18.88854 -2.74 0.006 -88.87511 -14.60523 _Iyr1Wme3 | -33.49254 11.77129 -2.85 0.005 -56.63492 -10.35015 _cons | 632.2356 5.800477 109.00 0.000 620.8318 643.6393 ------------------------------------------------------------------------------

Now we can obtain the simple effect of ** yr_rnd** at **mealcat**=1 by inspecting the coefficient for **_Iyr1Wme1**, the simple effect of
** yr_rnd** at **mealcat**=2 by inspecting the coefficient for **_Iyr1Wme2** and the simple effect of **yr_rnd** at **mealcat**=3 by inspecting the coefficient for **_Iyr1Wme3**.

Now let’s perform the same analysis using **xi** with dummy coding. Again, we will explicitly make the
third group for **mealcat** to be the omitted category.

char mealcat[omit] 3 xi : regress api00 i.mealcat*yr_rnd

i.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) i.meal~t*yr_rnd _ImeaXyr_rn_# (coded as above) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 5, 394) = 261.61 Model | 6204727.82 5 1240945.56 Prob > F = 0.0000 Residual | 1868944.18 394 4743.51314 R-squared = 0.7685 -------------+------------------------------ Adj R-squared = 0.7656 Total | 8073672.00 399 20234.7669 Root MSE = 68.873 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imealcat_1 | 288.1929 10.44284 27.60 0.000 267.6623 308.7236 _Imealcat_2 | 123.781 10.55185 11.73 0.000 103.036 144.5259 yr_rnd | -33.49254 11.77129 -2.85 0.005 -56.63492 -10.35015 _ImeaXyr_r~1 | -40.76438 29.23118 -1.39 0.164 -98.23297 16.70422 _ImeaXyr_r~2 | -18.24763 22.25624 -0.82 0.413 -62.00347 25.5082 _cons | 521.4925 8.414197 61.98 0.000 504.9502 538.0349 ------------------------------------------------------------------------------

In order to form a test of simple main effects we need to make a table like the one shown below that relates the means of the cells to the coefficients in the regression. Please see Chapter 3, section 3.5 for information on how this table was constructed.

mealcat=1 mealcat=2 mealcat=3 ------------------------------------------------- yr_rnd=0 _cons _cons _cons +BImealcat1 +BImealcat2 ------------------------------------------------- yr_rnd=1 _cons _cons _cons +Byr_rnd +Byr_rnd +Byr_rnd +BImealcat1 +BImealcat2 +B_ImeaXyr_rn_1 +B_ImeaXyr_rn_2

Let’s start by looking at how to get the simple effect of **yr_rnd** when **mealcat** is 3. Looking at the table above, we can see that we would want to compare **_cons** with **_cons + Byr_rnd**. We can do this with the **lincom** command as shown below.

lincom _cons - (_cons + yr_rnd)

( 1) - yr_rnd = 0.0 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 33.49254 11.77129 2.85 0.005 10.35015 56.63492 ------------------------------------------------------------------------------

We see that **_cons** drops out, yielding just **yr_rnd**. Instead, we can use the **test** command to test whether the coefficient for **yr_rnd** is 0. Note that this result corresponds to the result we found with the **xi3** command also testing the simple effect of **yr_rnd** when **mealcat** is 3.

test yr_rnd=0

( 1) yr_rnd = 0.0 F( 1, 394) = 8.10 Prob > F = 0.0047

Note that the coefficient for **yr_rnd** corresponds to the test of the effect of **yr_rnd** when all other variables are set to 0 (the reference category),
in other words, when **mealcat** is set to the reference category. You may be tempted to interpret the coefficient for **yr_rnd** as the overall difference between year round schools and non-year round schools, but in this example we see that it really corresponds to the simple effect of **yr_rnd**. When using dummy coding people commonly misinterpret the lower order effects to refer to overall effects rather than simple effects.

Now let’s look at the simple effect of **yr_rnd** when **mealcat**=1. Looking at the table above we see that this involves the comparison of the coefficients for
**yr_rnd**=1 versus **yr_rnd**=0 when **mealcat**=1, i.e., comparing ** _cons** +
** yr_rnd** + ** _Imealcat_1** + ** _ImeaXyr_rn_1** versus **_cons** +
**_Imealcat_1**. Removing the terms that drop out we can do the **test** command below.

test yr_rnd + _ImeaXyr_rn_1=0

( 1) yr_rnd + _ImeaXyr_rn_1 = 0.0 F( 1, 394) = 7.70 Prob > F = 0.0058

We can likewise obtain the effect of **yr_rnd** when **mealcat** is 2, as shown below.

test yr_rnd + _ImeaXyr_rn_2=0

( 1) yr_rnd + _ImeaXyr_rn_2 = 0.0 F( 1, 394) = 7.50 Prob > F = 0.0064

These examples illustrate that it is more complicated to form simple effects when using dummy coding, and also that the interpretation of lower order effects when using dummy coding may not have the meaning that you would expect.

**6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd**

Example 1 looked at simple effects for **yr_rnd**, a variable with only
two levels In this example, let’s consider the simple effects of **mealcat** at each level of **yr_rnd**. Because **mealcat** has more than
two levels, we can see what is required for doing tests of simple effects for variables with more than
two levels.

First, let’s show how to get these simple effects using the **xi3** command using effect coding.

xi3 : regress api00 g.mealcat@g.yr_rnd

g.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) g.yr_rnd _Iyr_rnd_0-1 (naturally coded; _Iyr_rnd_0 omitted) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 5, 394) = 261.61 Model | 6204727.82 5 1240945.56 Prob > F = 0.0000 Residual | 1868944.18 394 4743.51314 R-squared = 0.7685 -------------+------------------------------ Adj R-squared = 0.7656 Total | 8073672 399 20234.7669 Root MSE = 68.873 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iyr_rnd_1 | -53.16321 11.60095 -4.58 0.000 -75.97072 -30.3557 _Ime1Wyr0 | 288.1929 10.44284 27.60 0.000 267.6623 308.7236 _Ime1Wyr1 | 247.4286 27.30218 9.06 0.000 193.7524 301.1048 _Ime2Wyr0 | 123.781 10.55185 11.73 0.000 103.036 144.5259 _Ime2Wyr1 | 105.5333 19.59588 5.39 0.000 67.00776 144.0589 _cons | 632.2356 5.800477 109.00 0.000 620.8318 643.6393 ------------------------------------------------------------------------------

We can get the simple effect of **mealcat** at **yr_rnd** = 0 just as we did earlier in this chapter.

test _Ime1Wyr0 _Ime2Wyr0

( 1) _Ime1Wyr0 = 0 ( 2) _Ime2Wyr0 = 0 F( 2, 394) = 411.46 Prob > F = 0.0000

And we likewise get the simple effect of **mealcat** at **yr_rnd** = 1 as shown below.

test _Ime1Wyr1 _Ime2Wyr1

( 1) _Ime1Wyr1 = 0 ( 2) _Ime2Wyr1 = 0 F( 2, 394) = 50.19 Prob > F = 0.0000

We can now test the simple effects of **mealcat** at each level of **yr_rnd** via dummy coding.

xi : regress api00 i.mealcat*yr_rnd

i.mealcat _Imealcat_1-3 (naturally coded; _Imealcat_3 omitted) i.meal~t*yr_rnd _ImeaXyr_rn_# (coded as above) Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 5, 394) = 261.61 Model | 6204727.82 5 1240945.56 Prob > F = 0.0000 Residual | 1868944.18 394 4743.51314 R-squared = 0.7685 -------------+------------------------------ Adj R-squared = 0.7656 Total | 8073672.00 399 20234.7669 Root MSE = 68.873 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imealcat_1 | 288.1929 10.44284 27.60 0.000 267.6623 308.7236 _Imealcat_2 | 123.781 10.55185 11.73 0.000 103.036 144.5259 yr_rnd | -33.49254 11.77129 -2.85 0.005 -56.63492 -10.35015 _ImeaXyr_r~1 | -40.76438 29.23118 -1.39 0.164 -98.23297 16.70422 _ImeaXyr_r~2 | -18.24763 22.25624 -0.82 0.413 -62.00347 25.5082 _cons | 521.4925 8.414197 61.98 0.000 504.9502 538.0349 ------------------------------------------------------------------------------

The simple effect of **mealcat** when **yr_rnd** is 0 requires two
** test** statements since it is a 2 degree of freedom test. We can do this by testing mean(mealcat1) = mean(mealcat2) and also testing mean(mealcat2) = mean(mealcat3). We can look at the table above and see that mean(mealcat1) = mean(mealcat2) is
**_Imealcat_1**– ** _Imealcat_2** (after ** _cons** drops out) and mean(mealcat2) = mean(mealcat3) is
** _Imealcat_2** after ** _cons** drops out. So, we can perform this test using the two
** test** commands below.

test _Imealcat_1- _Imealcat_2=0

( 1) _Imealcat_1 - _Imealcat_2 = 0.0 F( 1, 394) = 343.05 Prob > F = 0.0000

test _Imealcat_2, accum

( 1) _Imealcat_1 - _Imealcat_2 = 0.0 ( 2) _Imealcat_2 = 0.0 F( 2, 394) = 411.46 Prob > F = 0.0000

Note that the effects **_Imealcat_1** and **_Imealcat_2** do not correspond to overall effects of the variable **mealcat** but are the simple effects when **yr_rnd** is set to 0, the reference level. Again we see that the terms that we might be tempted to call main effects and think of as overall effects really are simple effects when dummy coding is used.

The second **test** command uses the **accum** option to accumulate the tests to get the 2 degree of freedom test that corresponds to the simple effect of **mealcat** when **yr_rnd** is 0.

Likewise, we can look at the table above to form the comparisons needed to obtain the simple effects of **mealcat** when **yr_rnd** is 1.

test _Imealcat_1+ _ImeaXyr_rn_1- _Imealcat_2- _ImeaXyr_rn_2=0

( 1) _Imealcat_1 - _Imealcat_2 + _ImeaXyr_rn_1 - _ImeaXyr_rn_2 = 0.0 F( 1, 394) = 20.26 Prob > F = 0.0000

test _Imealcat_2+ _ImeaXyr_rn_2=0, accum

( 1) _Imealcat_1 - _Imealcat_2 + _ImeaXyr_rn_1 - _ImeaXyr_rn_2 = 0.0 ( 2) _Imealcat_2 + _ImeaXyr_rn_2 = 0.0 F( 2, 394) = 50.19 Prob > F = 0.0000

Using this example we hoped to illustrate that when performing simple effects for a variable with more than
two levels can be quite tricky and requires constructing multiple ** test** commands, one
** test** command for every degree of freedom in the simple effect. As you can see, constructing these terms can be very tricky and possibly error prone. Without a method for double checking results, it is very possible to make a mistake when constructing terms and form the wrong comparison. By comparison, using effect coding with **xi3**, forming comparisons can be much easier and the interpretation of the lower order effects is much more intuitive. The lower order effects do correspond to the overall effects of the variable, for example the effect of **yr_rnd**, when using effect coding, does correspond to the overall unweighted mean for the year round schools compared to the non-year round schools.