You can use multiple contrast statements in a proc glm call to conduct tests of simple main effects. This is particularly useful when exploring the interaction of three categorical variables in ANOVA. If you are not familiar with three-way interactions in ANOVA, please see our general FAQ on understanding three-way interactions in ANOVA. In short, a three-way interaction means that there is a two-way interaction that varies across levels of a third variable. Say, for example, that a b*c interaction differs across various levels of factor a.
One way of analyzing the three-way interaction is through the use of tests of simple main-effects, e.g., the effect of one variable (or set of variables) across the levels of another variable.
We will use a small artificial dataset called threeway that has a statistically significant three-way interaction to illustrate the process. In our example data set, variables a, b and c are categorical. The techniques shown on this page can be generalized to situations in which one or more variables are continuous, but the more continuous variables that are involved in the interaction, the more complicated things get.
The results (shown below) indicate that the b*c interaction is statistically significant at a=1 but not at a=2. Because of this, the second two contrast statements are needed; these show the effect of c at a=1 at both levels of b.
After we look at the results, we will look at the coding used.
proc glm data = threeway; class a b c; model y = a b c a*b a*c b*c a*b*c; contrast 'b*c at a=1' b*c 1 0 -1 -1 0 1 a*b*c 1 0 -1 -1 0 1 0 0 0 0 0 0, b*c 0 1 -1 0 -1 1 a*b*c 0 1 -1 0 -1 1 0 0 0 0 0 0; contrast 'b*c at a=2' b*c 1 0 -1 -1 0 1 a*b*c 0 0 0 0 0 0 1 0 -1 -1 0 1, b*c 0 1 -1 0 -1 1 a*b*c 0 0 0 0 0 0 0 1 -1 0 -1 1; contrast 'c at a=1 & b=1' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 1 0 -1 0 0 0 a*b*c 1 0 -1 0 0 0 0 0 0 0 0 0, c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 1 -1 0 0 0 a*b*c 0 1 -1 0 0 0 0 0 0 0 0 0; contrast 'c at a=1 & b=2' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 0 0 0 1 0 -1 a*b*c 0 0 0 1 0 -1 0 0 0 0 0 0, c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 0 0 0 1 -1 a*b*c 0 0 0 0 1 -1 0 0 0 0 0 0; run; quit;
The GLM Procedure Class Level Information Class Levels Values A 2 1 2 B 2 1 2 C 3 1 2 3 Number of Observations Read 24 Number of Observations Used 24
Dependent Variable: Y Sum of Source DF Squares Mean Square F Value Pr > F Model 11 497.8333333 45.2575758 33.94 <.0001 Error 12 16.0000000 1.3333333 Corrected Total 23 513.8333333 R-Square Coeff Var Root MSE Y Mean 0.968861 7.655473 1.154701 15.08333 Source DF Type I SS Mean Square F Value Pr > F A 1 150.0000000 150.0000000 112.50 <.0001 B 1 0.6666667 0.6666667 0.50 0.4930 C 2 127.5833333 63.7916667 47.84 <.0001 A*B 1 160.1666667 160.1666667 120.12 <.0001 A*C 2 18.2500000 9.1250000 6.84 0.0104 B*C 2 22.5833333 11.2916667 8.47 0.0051 A*B*C 2 18.5833333 9.2916667 6.97 0.0098 Source DF Type III SS Mean Square F Value Pr > F A 1 150.0000000 150.0000000 112.50 <.0001 B 1 0.6666667 0.6666667 0.50 0.4930 C 2 127.5833333 63.7916667 47.84 <.0001 A*B 1 160.1666667 160.1666667 120.12 <.0001 A*C 2 18.2500000 9.1250000 6.84 0.0104 B*C 2 22.5833333 11.2916667 8.47 0.0051 A*B*C 2 18.5833333 9.2916667 6.97 0.0098 Contrast DF Contrast SS Mean Square F Value Pr > F b*c at a=1 2 40.66666667 20.33333333 15.25 0.0005 b*c at a=2 2 0.50000000 0.25000000 0.19 0.8314 c at a=1 & b=1 2 64.00000000 32.00000000 24.00 <.0001 c at a=1 & b=2 2 1.33333333 0.66666667 0.50 0.6186
In the first contrast statement, we are interested in the b*c interaction at a=1. The b*c interaction has 2 degrees of freedom ( (2-1)*(3-1) = 2 ). To indicate this, we use a semicolon to separate the two parts. Also, because we have included the two-way interaction, we also need to include the three-way interaction. In the second contrast statement, we are looking at the b*c interaction at a=2. Realistically, we wouldn’t know to to include the third and fourth contrast statements until we had run the first two and seen the results. To save space, we have included these two contrast statements, which investigate c at a=1 and both levels of b.
Let’s look a little closer at the coding of the variables on the contrast statements. First, we need to remember that the variable a has two levels, b has two levels, and c has three levels. The coding (which is effect coding) is for each cell produced by the crossing of the categorical predictor variables. This is perhaps best understood as the "differences of differences" approach. (For more information, please see Multiple Regression: Testing and Interpreting Interactions by Leona S. Aiken and Steven G. West).
proc glm data = threeway; class a b c; model y = a b c a*b a*c b*c a*b*c; contrast 'b*c at a=1' b*c 1 0 -1 -1 0 1 a*b*c 1 0 -1 -1 0 1 0 0 0 0 0 0, b*c 0 1 -1 0 -1 1 a*b*c 0 1 -1 0 -1 1 0 0 0 0 0 0; contrast 'b*c at a=2' b*c 1 0 -1 -1 0 1 a*b*c 0 0 0 0 0 0 1 0 -1 -1 0 1, b*c 0 1 -1 0 -1 1 a*b*c 0 0 0 0 0 0 0 1 -1 0 -1 1; contrast 'c at a=1 & b=1' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 1 0 -1 0 0 0 a*b*c 1 0 -1 0 0 0 0 0 0 0 0 0, c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 1 -1 0 0 0 a*b*c 0 1 -1 0 0 0 0 0 0 0 0 0; contrast 'c at a=1 & b=2' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 0 0 0 1 0 -1 a*b*c 0 0 0 1 0 -1 0 0 0 0 0 0, c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 0 0 0 1 -1 a*b*c 0 0 0 0 1 -1 0 0 0 0 0 0; run; quit;
The first contrast statement
Let’s take the first line of the first contrast statement as an example. We have the b*c interaction at a=1, and we are comparing c1 to c3. In other words, c3 is our reference group. Picking c3 as our reference group is somewhat arbitrary; we could have used c1 or c2. The "differences of differences" approach means that we are going to take the difference of c1 and c3 at b=1, and the difference of c1 and c3 at b=2, and then take the difference of those two differences. In the table below, we have six cells (because 2 levels of b times 3 levels of c equals 6). We have called the cells msubscript, so that we can do some symbolic math.
a=1
c1 | c2 | c3 | |
b=1 | m11 | m12 | m13 |
b=2 | m21 | m22 | m23 |
(m11 – m13) – (m21 – m23)
(1 0 -1) – (1 0 -1) = 1 0 -1 -1 0 1
Notice that 1 0 -1 -1 0 1 are the first six entries in the first line of the first contrast statement.
Now let’s look at the second part, the a*b*c interaction. The first six numbers are for a=1, and the second six are for a=2. Because we are only looking at a=1 in this analysis, all of the values for a=2 are 0. The values for a=1 are the same as those for the b*c interaction.
Here is another way of thinking about the first line of the first contrast statement:
contrast 'b*c at a=1' b*c 1 0 -1 -1 0 1 a*b*c 1 0 -1 -1 0 1 0 0 0 0 0 0;
Yellow: b=1, comparing c1 with c3
Orange: b=2, comparing c1 with c3
Green: a=1 and b=1, comparing c1 with c3
Blue: a=1 and b=2, comparing c1 with c3
Pink: a=2 and b=1, these are all 0s because we are looking only at a=1
Purple: a=2 and b=2, these are all 0s because we are looking only at a=1
The second line of the first contrast statement is very similar to the first, except that it is for c2 versus c3. So, we have
(m12 – m13) – (m22 – m23)
(0 1 -1) – (0 1 -1) = 0 1 -1 0 -1 1
contrast 'b*c at a=1' b*c 1 0 -1 -1 0 1 a*b*c 1 0 -1 -1 0 1 0 0 0 0 0 0; b*c 0 1 -1 0 -1 1 a*b*c 0 1 -1 0 -1 1 0 0 0 0 0 0
Yellow: b=1, comparing c2 with c3
Orange: b=2, comparing c2 with c3
Green: a=1 and b=1, comparing c2 with c3
Blue: a=1 and b=2, comparing c2 with c3
Pink: a=2 and b=1, these are all 0s because we are looking only at a=1
Purple: a=2 and b=2, these are all 0s because we are looking only at a=1
The second contrast statement
The second contrast statement looks at the b*c interaction at a=2. It is the same as the first, except in the part for the a*b*c interaction. Here, the first six 0s are for a=1, which we are not considering in this contrast statement. The same coding used in the first contrast statement is simply shifted to the a=2 part of the code.
The third contrast statement
contrast 'c at a=1 & b=1' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 1 0 -1 0 0 0 a*b*c 1 0 -1 0 0 0 0 0 0 0 0 0; c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 1 -1 0 0 0 a*b*c 0 1 -1 0 0 0 0 0 0 0 0 0
By now, the coding for c, the first part of the contrast statement, should be familiar. In this first line, we are comparing c1 with c3.
a=1
c1 | c2 | c3 | |
b=1 | m11 | m12 | m13 |
b=2 | m21 | m22 | m23 |
(m11 – m13) – (m21 – m23)
(1 0 -1) – (1 0 -1) = 1 0 -1 -1 0 1
Red: comparing c1 with c3
Light blue: a=1, comparing c1 and c3
Dark green: a=2, these are all 0 because we are looking at a=1
Yellow: b=1, comparing c1 with c3
Orange: b=2, these are all 0 because we are looking at b=1
Light green: a=1, b=1, comparing c1 with c3
Dark blue: a=1, b=2, these are all 0 because we are looking at b=1
Pink: a=2, b=1, these are all 0 because we are looking at a=1
Purple: a=2, b=2, these are all 0 because we are looking at a=1 and b=1
The second line of the third contrast statement is very similar to the first line, except that it compares c2 to c3.
contrast 'c at a=1 & b=1' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 1 0 -1 0 0 0 a*b*c 1 0 -1 0 0 0 0 0 0 0 0 0; c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 1 -1 0 0 0 a*b*c 0 1 -1 0 0 0 0 0 0 0 0 0
Light blue: a=1, comparing c2 and c3
Dark green: a=2, these are all 0 because we are looking at a=1
Yellow: b=1, comparing c2 with c3
Orange: b=2, these are all 0 because we are looking at b=1
Light green: a=1, b=1, comparing c2 with c3
Dark blue: a=1, b=2, these are all 0 because we are looking at b=1
Pink: a=2, b=1, these are all 0 because we are looking at a=1
Purple: a=2, b=2, these are all 0 because we are looking at a=1 and b=1
The fourth contrast statement
The fourth contrast statement is the same as the third, except we are now looking at b=2. Hence, we have 0s for the b=1 part of the code and the comparisons of the different levels of c in the b=2 part of the code.
contrast 'c at a=1 & b=2' c 1 0 -1 a*c 1 0 -1 0 0 0 b*c 0 0 0 1 0 -1 a*b*c 0 0 0 1 0 -1 0 0 0 0 0 0; c 0 1 -1 a*c 0 1 -1 0 0 0 b*c 0 0 0 0 1 -1 a*b*c 0 0 0 0 1 -1 0 0 0 0 0 0.
Correcting for multiple tests
We should note that although a p-value is given for each of the four F-tests, it is not corrected for the multiple tests. There are at least four different methods of determining the critical value of tests of simple main-effects. There is a method related to Dunn’s multiple comparisons, a method attributed to Marascuilo and Levin, a method called the simultaneous test procedure (very conservative and related to the Scheffé post-hoc test) and a per family error rate method. We will demonstrate the per family error rate method, but you should look up the other methods in a good ANOVA book, such as Kirk (1995), to decide which approach is best for your situation.
Let’s take the first two tests, comparing b*c at a=1 and at a=2 as an example. The values for the F-tests were 15.25 and .188, respectively. We divide our alpha level, 0.05, by 2 because we are doing two tests of simple main-effects, so our new value of alpha is .025. The finv function requires us to provide 1 – alpha, so we have 1 – .025 = .975.
data _null_; x = finv(.975, 2, 12); put "The critical value per family error rate is " x; run;The critical value per family error rate is 5.0958671658
As you can see, the critical value is approximately 5.1. This indicates that the b*c interaction is statistically significant at a=1 but not at a=2.
References
Kirk, Roger E. (1995) Experimental Design: Procedures for the Behavioral Sciences, Third Edition. Monterey, California: Brooks/Cole Publishing.
Aiken, Leona S., and West, Stephen G. (1996) Multiple Regression: Testing and Interpreting Interactions. Thousand Oaks, California: Sage Publishing.