We will use a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat to demonstrate. This page is based on Chapter 6 of our Regression with SAS Web Book. Variables mealcat and collcat are two categorical variables, both with three levels. The dependent variable is the school’s API index. We want to look at a simple comparison to compare group 1 versus 2 and above of collcat when mealcat = 1. One way of doing this using proc glm with estimate statement.
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0; run; quit;
Another way of accomplishing the same thing, but possibly easier, is to use a cell means model. A cell means model estimates only one parameter for each cell and sets the intercept to 0. In general, the cell means model is not used to produce an overall test of model fit, but is often used to write simpler estimate or contrast statements. So in practice, we need to write the proc glm code twice, once for the model fit and the second time for the estimates or contrasts. In the code shown below, the first proc glm is for model fit and the second one with the estimate statement is used to estimate the simple comparison. We use the noint option on the model statement in the second proc glm to specify that we are not going to estimate the intercept; therefore, will estimate one parameter per cell.
proc glm data = in.elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; run; quit;
proc glm data = in.elemapi2; class collcat mealcat; model api00 = collcat*mealcat/noint ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 2 0 0 -1 0 0 -1 0 0 /divisor=2; quit;
Notice that the order of categorical variables in the class statement decides which variable is the row variable and which is the column variable. For example, in the code above, collcat will be the row variable and mealcat will be the column variable. Therefore, the simple comparison we are interested can be formulated as the following table. Writing the numbers in the table one row at a time, we can write our estimate statement as
estimate 'simple comparison' collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 ;
or equivalently, we can make use of the option divisor = to rewrite the statement in terms of whole numbers as shown above.
collcat /mealcat | mealcat = 1 | mealcat = 2 | mealcat = 3 |
collcat = 1 | 1 | 0 | 0 |
collcat =2 | -.5 | 0 | 0 |
collcat = 3 | -.5 | 0 | 0 |
If we switch the order of variables in the class statement, we will have to rewrite our estimate statement accordingly. For example, we can rewrite the above proc glm statement such as the following and it produces exactly the same result from the estimate statement, since the corresponding table is simply being transposed.
mealcat/collcat | collcat = 1 | collcat =2 | collcat=3 |
mealcat = 1 | 1 | -.5 | -.5 |
mealcat = 2 | 0 | 0 | 0 |
mealcat = 3 | 0 | 0 | 0 |
proc glm data = in.elemapi2; class mealcat collcat; model api00 = mealcat*collcat/noint ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 2 -1 -1 0 0 0 0 0 0 /divisor=2 e; quit;
In the following examples, we will use a different data set and we will use the contrast statement. This made-up data set has males and females rating three different flavors of ice cream.
data icecream; input id flavor female rating; cards; 1 1 0 8 2 1 1 3 3 1 0 7 4 1 1 2 5 1 1 3 6 1 0 7 7 1 1 2 8 1 1 9 9 1 0 2 10 1 0 2 11 2 1 9 12 2 1 8 13 2 0 3 14 2 1 9 15 2 0 4 16 2 0 3 17 2 0 3 18 2 1 9 19 2 1 7 20 2 0 9 21 3 0 2 22 3 1 8 23 3 0 2 24 3 1 8 25 3 0 3 26 3 1 9 27 3 1 8 28 3 1 8 29 3 0 1 30 3 1 9 ; run;
We will first run proc glm with the full model as normally would to see if gender, flavor or their interaction predicts rating.
proc glm data = icecream; class flavor female; model rating = flavor female flavor*female; run; quit;
The GLM Procedure Class Level Information Class Levels Values flavor 3 1 2 3 female 2 0 1 Number of observations 30
The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 5 160.0333333 32.0066667 7.43 0.0002 Error 24 103.3333333 4.3055556 Corrected Total 29 263.3666667 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor 2 18.86666667 9.43333333 2.19 0.1337 female 1 63.63378378 63.63378378 14.78 0.0008 flavor*female 2 77.53288288 38.76644144 9.00 0.0012 Source DF Type III SS Mean Square F Value Pr > F flavor 2 18.57072072 9.28536036 2.16 0.1376 female 1 65.59269406 65.59269406 15.23 0.0007 flavor*female 2 77.53288288 38.76644144 9.00 0.0012
From the output above, we can see that we have a statistically significant effect for the variable female (F = 15.23, p = 0.0007) and the interaction of flavor by female (F = 9.00, p = 0.0012). Now let’s use the cell means model to contrast males and females, collapsing across the three flavors of ice cream. As before, we use the noint option on the model statement so that we only estimate means for the cells in the contrast. The e option on the contrast statement gives us the contrast coefficients that SAS used, which is helpful in confirming that we did what we wanted to do. Be sure to note how the model statement in the proc glm is different from the model statement in the previous proc glm. When running a cell means model, only the interaction is included on the model statement. This is why only the output from the contrast statement is of interest: the main effects that make up the interaction are not in this model.
proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting males and females' flavor*female 1 -1 1 -1 1 -1 / e; run; quit;
The GLM Procedure Coefficients for Contrast contrasting males and females Row 1 flavor*female 1 0 1 flavor*female 1 1 -1 flavor*female 2 0 1 flavor*female 2 1 -1 flavor*female 3 0 1 flavor*female 3 1 -1
The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting males and females 1 65.59269406 65.59269406 15.23 0.0007
The output above indicates that the contrast was statistically significant (F = 15.23, p = 0.0007).
Now let’s contrast flavors 1 and 2, collapsing over gender.
proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting flavors 1 and 2' flavor*female -1 -1 1 1 0 0 / e ; run; quit;
The GLM Procedure Coefficients for Contrast contrasting flavors 1 and 2 Row 1 flavor*female 1 0 -1 flavor*female 1 1 -1 flavor*female 2 0 1 flavor*female 2 1 1 flavor*female 3 0 0 flavor*female 3 1 0
The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting flavors 1 and 2 1 18.05000000 18.05000000 4.19 0.0517
This contrast is right on the boarder of being statistically significant (F = 4.19, p = 0.0517).
Finally, we will contrast flavor 2 for males and females.
proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting flavor 2 for males and females' flavor*female 0 0 1 -1 0 0 / e ; run; quit;
The GLM Procedure Coefficients for Contrast contrasting flavor 2 for males and females Row 1 flavor*female 1 0 0 flavor*female 1 1 0 flavor*female 2 0 1 flavor*female 2 1 -1 flavor*female 3 0 0 flavor*female 3 1 0
The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting flavor 2 for males and females 1 40.00000000 40.00000000 9.29 0.0055
This contrast is statistically significant (F = 9.29, p = 0.0055).