We will use a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat to demonstrate. This page is based on Chapter 6 of our Regression with SAS Web Book. Variables

mealcatandcollcatare two categorical variables, both with three levels. The dependent variable is the school’s API index. We want to look at a simple comparison to compare group 1 versus 2 and above ofcollcatwhenmealcat= 1. One way of doing this usingproc glmwithestimatestatement.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0; run; quit;

Another way of accomplishing the same thing, but possibly easier, is to use a cell means model. A cell means model estimates only one parameter for each cell and sets the intercept to 0. In general, the cell means model is not used to produce an overall test of model fit, but is often used to write simpler

estimateorcontraststatements. So in practice, we need to write theproc glmcode twice, once for the model fit and the second time for the estimates or contrasts. In the code shown below, the firstproc glmis for model fit and the second one with theestimatestatement is used to estimate the simple comparison. We use thenointoption on themodelstatement in the secondproc glmto specify that we are not going to estimate the intercept; therefore, will estimate one parameter per cell.

proc glm data = in.elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; run; quit;

proc glm data = in.elemapi2; class collcat mealcat; model api00 = collcat*mealcat/noint ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 2 0 0 -1 0 0 -1 0 0 /divisor=2; quit;

Notice that the order of categorical variables in the

classstatement decides which variable is the row variable and which is the column variable. For example, in the code above,collcatwill be the row variable andmealcatwill be the column variable. Therefore, the simple comparison we are interested can be formulated as the following table. Writing the numbers in the table one row at a time, we can write ourestimatestatement as

estimate 'simple comparison' collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 ;

or equivalently, we can make use of the option

divisor =to rewrite the statement in terms of whole numbers as shown above.

collcat /mealcat | mealcat = 1 | mealcat = 2 | mealcat = 3 |

collcat = 1 | 1 | 0 | 0 |

collcat =2 | -.5 | 0 | 0 |

collcat = 3 | -.5 | 0 | 0 |

If we switch the order of variables in the

classstatement, we will have to rewrite ourestimatestatement accordingly. For example, we can rewrite the aboveproc glmstatement such as the following and it produces exactly the same result from theestimatestatement, since the corresponding table is simply being transposed.

mealcat/collcat | collcat = 1 | collcat =2 | collcat=3 |

mealcat = 1 | 1 | -.5 | -.5 |

mealcat = 2 | 0 | 0 | 0 |

mealcat = 3 | 0 | 0 | 0 |

proc glm data = in.elemapi2; class mealcat collcat; model api00 = mealcat*collcat/noint ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 2 -1 -1 0 0 0 0 0 0 /divisor=2 e; quit;

In the following examples, we will use a different data set and we will use the

contraststatement. This made-up data set has males and females rating three different flavors of ice cream.

data icecream; input id flavor female rating; cards; 1 1 0 8 2 1 1 3 3 1 0 7 4 1 1 2 5 1 1 3 6 1 0 7 7 1 1 2 8 1 1 9 9 1 0 2 10 1 0 2 11 2 1 9 12 2 1 8 13 2 0 3 14 2 1 9 15 2 0 4 16 2 0 3 17 2 0 3 18 2 1 9 19 2 1 7 20 2 0 9 21 3 0 2 22 3 1 8 23 3 0 2 24 3 1 8 25 3 0 3 26 3 1 9 27 3 1 8 28 3 1 8 29 3 0 1 30 3 1 9 ; run;

We will first run

proc glmwith the full model as normally would to see if gender, flavor or their interaction predicts rating.

proc glm data = icecream; class flavor female; model rating = flavor female flavor*female; run; quit;

The GLM Procedure Class Level Information Class Levels Values flavor 3 1 2 3 female 2 0 1 Number of observations 30

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 5 160.0333333 32.0066667 7.43 0.0002 Error 24 103.3333333 4.3055556 Corrected Total 29 263.3666667 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor 2 18.86666667 9.43333333 2.19 0.1337 female 1 63.63378378 63.63378378 14.78 0.0008 flavor*female 2 77.53288288 38.76644144 9.00 0.0012 Source DF Type III SS Mean Square F Value Pr > F flavor 2 18.57072072 9.28536036 2.16 0.1376 female 1 65.59269406 65.59269406 15.23 0.0007 flavor*female 2 77.53288288 38.76644144 9.00 0.0012

From the output above, we can see that we have a statistically significant effect for the variable

female(F = 15.23, p = 0.0007) and the interaction offlavorbyfemale(F = 9.00, p = 0.0012). Now let’s use the cell means model to contrast males and females, collapsing across the three flavors of ice cream. As before, we use thenointoption on themodelstatement so that we only estimate means for the cells in the contrast. Theeoption on thecontraststatement gives us the contrast coefficients that SAS used, which is helpful in confirming that we did what we wanted to do. Be sure to note how themodelstatement in theproc glmis different from the model statement in the previousproc glm. When running a cell means model, only the interaction is included on themodelstatement. This is why only the output from thecontraststatement is of interest: the main effects that make up the interaction are not in this model.

proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting males and females' flavor*female 1 -1 1 -1 1 -1 / e; run; quit;

The GLM Procedure Coefficients for Contrast contrasting males and females Row 1 flavor*female 1 0 1 flavor*female 1 1 -1 flavor*female 2 0 1 flavor*female 2 1 -1 flavor*female 3 0 1 flavor*female 3 1 -1

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting males and females 1 65.59269406 65.59269406 15.23 0.0007

The output above indicates that the contrast was statistically significant (F = 15.23, p = 0.0007).

Now let’s contrast flavors 1 and 2, collapsing over gender.

proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting flavors 1 and 2' flavor*female -1 -1 1 1 0 0 / e ; run; quit;

The GLM Procedure Coefficients for Contrast contrasting flavors 1 and 2 Row 1 flavor*female 1 0 -1 flavor*female 1 1 -1 flavor*female 2 0 1 flavor*female 2 1 1 flavor*female 3 0 0 flavor*female 3 1 0

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting flavors 1 and 2 1 18.05000000 18.05000000 4.19 0.0517

This contrast is right on the boarder of being statistically significant (F = 4.19, p = 0.0517).

Finally, we will contrast flavor 2 for males and females.

proc glm data = icecream; class flavor female; model rating = flavor*female / noint; contrast 'contrasting flavor 2 for males and females' flavor*female 0 0 1 -1 0 0 / e ; run; quit;

The GLM Procedure Coefficients for Contrast contrasting flavor 2 for males and females Row 1 flavor*female 1 0 0 flavor*female 1 1 0 flavor*female 2 0 1 flavor*female 2 1 -1 flavor*female 3 0 0 flavor*female 3 1 0

The GLM Procedure Dependent Variable: rating Sum of Source DF Squares Mean Square F Value Pr > F Model 6 1089.666667 181.611111 42.18 <.0001 Error 24 103.333333 4.305556 Uncorrected Total 30 1193.000000 R-Square Coeff Var Root MSE rating Mean 0.607645 37.27515 2.074983 5.566667 Source DF Type I SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Source DF Type III SS Mean Square F Value Pr > F flavor*female 6 1089.666667 181.611111 42.18 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F contrasting flavor 2 for males and females 1 40.00000000 40.00000000 9.29 0.0055

This contrast is statistically significant (F = 9.29, p = 0.0055).