How can I define a cell means model using proc glm?

We will use a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat to demonstrate. This page is based on Chapter 6 of our Regression with SAS Web Book. Variables mealcat and collcat are two categorical variables, both with three levels. The dependent variable is the school’s API index. We want to look at a simple comparison to compare group 1 versus 2 and above of collcat when mealcat = 1. One way of doing this using proc glm with estimate statement.

proc glm data = elemapi2;
  class collcat mealcat;
  model api00 = collcat mealcat collcat*mealcat/ss3;
  estimate 'collcat 1 vs 2+ within mealcat = 1'
                      collcat 1 -.5 -.5
              collcat*mealcat 1   0   0
		             -.5  0   0
		             -.5  0   0;
run;
quit;

Another way of accomplishing the same thing, but possibly easier, is to use a cell means model. A cell means model estimates only one parameter for each cell and sets the intercept to 0. In general, the cell means model is not used to produce an overall test of model fit, but is often used to write simpler estimate or contrast statements. So in practice, we need to write the proc glm code twice, once for the model fit and the second time for the estimates or contrasts. In the code shown below, the first proc glm is for model fit and the second one with the estimate statement is used to estimate the simple comparison. We use the noint option on the model statement in the second proc glm to specify that we are not going to estimate the intercept; therefore, will estimate one parameter per cell.

proc glm data = in.elemapi2;
  class collcat mealcat;
  model api00 = collcat mealcat collcat*mealcat/ss3;
run;
quit;

proc glm data = in.elemapi2;
  class collcat mealcat;
  model api00 = collcat*mealcat/noint ss3;
  estimate 'collcat 1 vs 2+ within mealcat = 1'
              collcat*mealcat 2 0 0 -1 0 0 -1 0 0 /divisor=2;
quit;

Notice that the order of categorical variables in the class statement decides which variable is the row variable and which is the column variable. For example, in the code above, collcat will be the row variable and mealcat will be the column variable. Therefore, the simple comparison we are interested can be formulated as the following table. Writing the numbers in the table one row at a time, we can write our estimate statement as

  estimate 'simple comparison'
              collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 ;

or equivalently, we can make use of the option divisor = to rewrite the statement in terms of whole numbers as shown above.

collcat /mealcat	mealcat = 1	mealcat = 2	mealcat = 3
collcat = 1	1	0	0
collcat =2	-.5	0	0
collcat = 3	-.5	0	0

If we switch the order of variables in the class statement, we will have to rewrite our estimate statement accordingly. For example, we can rewrite the above proc glm statement such as the following and it produces exactly the same result from the estimate statement, since the corresponding table is simply being transposed.

mealcat/collcat	collcat = 1	collcat =2	collcat=3
mealcat = 1	1	-.5	-.5
mealcat = 2	0	0	0
mealcat = 3	0	0	0

proc glm data = in.elemapi2;
  class  mealcat collcat;
  model api00 = mealcat*collcat/noint ss3;
  estimate 'collcat 1 vs 2+ within mealcat = 1'
              collcat*mealcat 2 -1 -1  0 0 0 0 0 0 /divisor=2 e;
quit;

In the following examples, we will use a different data set and we will use the contrast statement. This made-up data set has males and females rating three different flavors of ice cream.

data icecream;
input id flavor female rating;
cards;
1 1 0 8
2 1 1 3
3 1 0 7
4 1 1 2
5 1 1 3
6 1 0 7
7 1 1 2
8 1 1 9
9 1 0 2
10 1 0 2
11 2 1 9
12 2 1 8
13 2 0 3
14 2 1 9
15 2 0 4
16 2 0 3
17 2 0 3
18 2 1 9
19 2 1 7
20 2 0 9
21 3 0 2
22 3 1 8
23 3 0 2
24 3 1 8
25 3 0 3
26 3 1 9
27 3 1 8
28 3 1 8
29 3 0 1
30 3 1 9
;
run;

We will first run proc glm with the full model as normally would to see if gender, flavor or their interaction predicts rating.

proc glm data = icecream;
class flavor female;
model rating = flavor female flavor*female;
run;
quit;

The GLM Procedure

   Class Level Information

Class         Levels    Values

flavor             3    1 2 3

female             2    0 1

Number of observations    30

The GLM Procedure

Dependent Variable: rating

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        5     160.0333333      32.0066667       7.43    0.0002

Error                       24     103.3333333       4.3055556

Corrected Total             29     263.3666667

R-Square     Coeff Var      Root MSE    rating Mean

0.607645      37.27515      2.074983       5.566667

Source                      DF       Type I SS     Mean Square    F Value    Pr > F

flavor                       2     18.86666667      9.43333333       2.19    0.1337
female                       1     63.63378378     63.63378378      14.78    0.0008
flavor*female                2     77.53288288     38.76644144       9.00    0.0012

Source                      DF     Type III SS     Mean Square    F Value    Pr > F

flavor                       2     18.57072072      9.28536036       2.16    0.1376
female                       1     65.59269406     65.59269406      15.23    0.0007
flavor*female                2     77.53288288     38.76644144       9.00    0.0012

From the output above, we can see that we have a statistically significant effect for the variable female (F = 15.23, p = 0.0007) and the interaction of flavor by female (F = 9.00, p = 0.0012). Now let’s use the cell means model to contrast males and females, collapsing across the three flavors of ice cream. As before, we use the noint option on the model statement so that we only estimate means for the cells in the contrast. The e option on the contrast statement gives us the contrast coefficients that SAS used, which is helpful in confirming that we did what we wanted to do. Be sure to note how the model statement in the proc glm is different from the model statement in the previous proc glm. When running a cell means model, only the interaction is included on the model statement. This is why only the output from the contrast statement is of interest: the main effects that make up the interaction are not in this model.

proc glm data = icecream;
class flavor female;
model rating = flavor*female / noint;
contrast 'contrasting males and females' flavor*female 1 -1 
                                                       1 -1
                                                       1 -1 / e;
run;
quit;

The GLM Procedure

Coefficients for Contrast contrasting males and females

                            Row 1

flavor*female 1 0               1
flavor*female 1 1              -1
flavor*female 2 0               1
flavor*female 2 1              -1
flavor*female 3 0               1
flavor*female 3 1              -1

The GLM Procedure

Dependent Variable: rating

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        6     1089.666667      181.611111      42.18    <.0001

Error                       24      103.333333        4.305556

Uncorrected Total           30     1193.000000

R-Square     Coeff Var      Root MSE    rating Mean

0.607645      37.27515      2.074983       5.566667

Source                      DF       Type I SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001


Source                      DF     Type III SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001

Contrast                             DF     Contrast SS     Mean Square    F Value    Pr > F

contrasting males and females         1     65.59269406     65.59269406      15.23    0.0007

The output above indicates that the contrast was statistically significant (F = 15.23, p = 0.0007).

Now let’s contrast flavors 1 and 2, collapsing over gender.

proc glm data = icecream;
class flavor female;
model rating = flavor*female / noint;
contrast 'contrasting flavors 1 and 2' flavor*female -1 -1  
                                                      1  1
                                                      0  0 / e ;
run;
quit;

The GLM Procedure

Coefficients for Contrast contrasting flavors 1 and 2

                            Row 1

flavor*female 1 0              -1
flavor*female 1 1              -1
flavor*female 2 0               1
flavor*female 2 1               1
flavor*female 3 0               0
flavor*female 3 1               0

The GLM Procedure

Dependent Variable: rating

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        6     1089.666667      181.611111      42.18    <.0001

Error                       24      103.333333        4.305556

Uncorrected Total           30     1193.000000

R-Square     Coeff Var      Root MSE    rating Mean

0.607645      37.27515      2.074983       5.566667

Source                      DF       Type I SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001

Source                      DF     Type III SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001

Contrast                           DF     Contrast SS     Mean Square    F Value    Pr > F

contrasting flavors 1 and 2         1     18.05000000     18.05000000       4.19    0.0517

This contrast is right on the boarder of being statistically significant (F = 4.19, p = 0.0517).

Finally, we will contrast flavor 2 for males and females.

proc glm data = icecream;
class flavor female;
model rating = flavor*female / noint;
contrast 'contrasting flavor 2 for males and females' flavor*female  0  0  
                                                                      1 -1
                                                                      0  0 / e ;
run;
quit;

The GLM Procedure

Coefficients for Contrast contrasting flavor 2 for males and females

                            Row 1

flavor*female 1 0               0
flavor*female 1 1               0
flavor*female 2 0               1
flavor*female 2 1              -1
flavor*female 3 0               0
flavor*female 3 1               0

The GLM Procedure

Dependent Variable: rating

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        6     1089.666667      181.611111      42.18    <.0001

Error                       24      103.333333        4.305556

Uncorrected Total           30     1193.000000

R-Square     Coeff Var      Root MSE    rating Mean

0.607645      37.27515      2.074983       5.566667

Source                      DF       Type I SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001

Source                      DF     Type III SS     Mean Square    F Value    Pr > F

flavor*female                6     1089.666667      181.611111      42.18    <.0001

Contrast                                         DF   Contrast SS   Mean Square  F Value  Pr > F

contrasting flavor 2 for males and females       1   40.00000000   40.00000000     9.29  0.0055

This contrast is statistically significant (F = 9.29, p = 0.0055).