How can I do estimation using a cell-means model?

We will use a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat to demonstrate. Variables mealcat and collcat are two categorical variables, both with three levels. The dependent variable is school’s API index. We want to look at a simple comparison to comparing the average of levels 1 & 2 vs. group 3 for collcat when mealcat is at level 1. Additionally, we want to see if the collcat difference between level 1 and 3 at mealcat equal 1 is different from the same comparison at mealcat equal 3. This second comparison is a bit trickier and we will discuss the values for the contrast a bit later on this page.

We can do this in proc glm using the estimate statement. Please note the e option in the estimate statements that displays the contrast we have defined.

proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat';
  class collcat mealcat;
  model api00 = collcat mealcat collcat*mealcat/ ss3;
  estimate 'collcat 1&2 vs 3 at mealcat = 1'
                     collcat -.5 -.5  1
             collcat*mealcat -.5   0  0
                             -.5   0  0
                               1   0  0 / e;
  estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3'
                      collcat 0 0  0
              collcat*mealcat 1 0 -1 
                              0 0  0 
                             -1 0  1 / e;
run;
quit;

Here is the output for the above command.

                                        The GLM Procedure

                                    Class Level Information
                                 Class         Levels    Values
                                 collcat            3    1 2 3
                                 mealcat            3    1 2 3


                             Number of Observations Read         400
                             Number of Observations Used         400

                    Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1

                                                             Row 1

                               Intercept                         0

                               collcat         1              -0.5
                               collcat         2              -0.5
                               collcat         3                 1

                               mealcat         1                 0
                               mealcat         2                 0
                               mealcat         3                 0

                               collcat*mealcat 1 1            -0.5
                               collcat*mealcat 1 2               0
                               collcat*mealcat 1 3               0
                               collcat*mealcat 2 1            -0.5
                               collcat*mealcat 2 2               0
                               collcat*mealcat 2 3               0
                               collcat*mealcat 3 1               1
                               collcat*mealcat 3 2               0
                               collcat*mealcat 3 3               0

 Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3

                                                             Row 1

                               Intercept                         0

                               collcat         1                 0
                               collcat         2                 0
                               collcat         3                 0

                               mealcat         1                 0
                               mealcat         2                 0
                               mealcat         3                 0

                               collcat*mealcat 1 1               1
                               collcat*mealcat 1 2               0
                               collcat*mealcat 1 3              -1
                               collcat*mealcat 2 1               0
                               collcat*mealcat 2 2               0
                               collcat*mealcat 2 3               0
                               collcat*mealcat 3 1              -1
                               collcat*mealcat 3 2               0
                               collcat*mealcat 3 3               1

Dependent Variable: api00   api 2000

                                               Sum of
       Source                      DF         Squares     Mean Square    F Value    Pr > F
       Model                        8     6243714.810      780464.351     166.76    <.0001
       Error                      391     1829957.187        4680.197
       Corrected Total            399     8073671.998


                       R-Square     Coeff Var      Root MSE    api00 Mean
                       0.773343      10.56356      68.41197      647.6225


       Source                      DF     Type III SS     Mean Square    F Value    Pr > F

       collcat                      2       42140.566       21070.283       4.50    0.0117
       mealcat                      2     4764843.563     2382421.781     509.04    <.0001
       collcat*mealcat              4      124167.809       31041.952       6.63    <.0001


                                                            Standard
     Parameter                                          Estimate         Error     t Value   Pr > |t|
     collcat 1&2 vs 3 at mealcat = 1                 -39.1317809     12.2043453     -3.21     0.0015
     collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3     82.5777567     24.4394069      3.38     0.0008

The t-test for the first contrast was -3.21 with a p-value of .0015 while for the second contrast the t-value was 3.38 with a p-value of .0008. This all worked very nicely but seems a bit complicated. It might be easier if we adopt the cell-means approach. Here is the code and the output for a cell-means model without the estimate statement so that we can see how we would construct the contrast of interest. Please note the noint option on the model statement. This is needed so the we get all nine of the cell means. We wish to emphasize that the cell-means model is only used to obtain the cell means and should not be used to estimate model fit or statistical significance. The F-ratio shown below of 4,131.11 is somewhat misleading. It is jointly testing that all of the cell means equal zero. This hypothesis is neither informative nor interesting.

proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat';
  class collcat mealcat;
  model api00 = collcat*mealcat/noint ss3 solution;
run;
quit;

                                        The GLM Procedure

                                    Class Level Information

                                 Class         Levels    Values
                                 collcat            3    1 2 3
                                 mealcat            3    1 2 3


                             Number of Observations Read         400
                             Number of Observations Used         400

Dependent Variable: api00   api 2000

                                               Sum of
       Source                      DF         Squares     Mean Square    F Value    Pr > F
       Model                        9     174009675.8      19334408.4    4131.11    <.0001
       Error                      391       1829957.2          4680.2
       Uncorrected Total          400     175839633.0


                       R-Square     Coeff Var      Root MSE    api00 Mean
                       0.773343      10.56356      68.41197      647.6225


       Source                      DF     Type III SS     Mean Square    F Value    Pr > F
       collcat*mealcat              9     174009675.8      19334408.4    4131.11    <.0001


                                                       Standard
           Parameter                   Estimate           Error    t Value    Pr > |t|
           collcat*mealcat 1 1      816.9142857     11.56373322      70.64      <.0001
           collcat*mealcat 1 2      589.3500000     15.29738116      38.53      <.0001
           collcat*mealcat 1 3      493.9189189      7.95272978      62.11      <.0001
           collcat*mealcat 2 1      825.6511628     10.43272736      79.14      <.0001
           collcat*mealcat 2 2      636.6046512     10.43272736      61.02      <.0001
           collcat*mealcat 2 3      508.8333333      9.87441708      51.53      <.0001
           collcat*mealcat 3 1      782.1509434      9.39710655      83.23      <.0001
           collcat*mealcat 3 2      655.6376812      8.23583317      79.61      <.0001
           collcat*mealcat 3 3      541.7333333     17.66389427      30.67      <.0001

You can see the nine cells going from collcat*mealcat 1 1 to collcat*mealcat 3 3. Here is the same information put into a 3x3 table.

collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3

collcat = 1 816.9142857 589.3500000 493.9189189

collcat =2 825.6511628 636.6046512 508.8333333

collcat = 3 782.1509434 655.6376812 541.7333333

Here are the values of the contrast needed for the two estimate statements in the same tabular form as the table above.

collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3

collcat = 1 -.5 0 0

collcat =2 -.5 0 0

collcat = 3 1 0 0

collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3

collcat = 1 1 0 -1

collcat =2 0 0 0

collcat = 3 -1 0 -1

The values for the second contrast were derived from the following,

(collcat*mealcat11-collcat*mealcat31)-(collcat*mealcat31-collcat*mealcat33).

Which can be simplified to

collcat*mealcat11-collcat*mealcat31-collcat*mealcat31+collcat*mealcat33.

The coefficients for each of these terms looks like this, 1 -1 -1 1. When you include all of the cell means that have zero coefficients, the values look like this, 1 0 -1 0 0 0 -1 0 1.

Now it should be easy to code the estimate statement. Below is the SAS code and the nonduplicate output from the above model. Note that we make use of the divisor = 2 option so that we don't have to write decimal fractions.

proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat';
  class collcat mealcat;
  model api00 = collcat*mealcat/noint ss3 solution;
  estimate 'collcat 1&2 vs 3 at mealcat = 1'
              collcat*mealcat -1 0 0 -1 0 0 2 0 0 /divisor=2 e;
  estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3'
              collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e;
run;
quit;

                                        The GLM Procedure

[ duplicate output omitted ]

                    Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1

                                                             Row 1
                               collcat*mealcat 1 1            -0.5
                               collcat*mealcat 1 2               0
                               collcat*mealcat 1 3               0
                               collcat*mealcat 2 1            -0.5
                               collcat*mealcat 2 2               0
                               collcat*mealcat 2 3               0
                               collcat*mealcat 3 1               1
                               collcat*mealcat 3 2               0
                               collcat*mealcat 3 3               0

Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3

                                                             Row 1
                               collcat*mealcat 1 1               1
                               collcat*mealcat 1 2               0
                               collcat*mealcat 1 3              -1
                               collcat*mealcat 2 1               0
                               collcat*mealcat 2 2               0
                               collcat*mealcat 2 3               0
                               collcat*mealcat 3 1              -1
                               collcat*mealcat 3 2               0
                               collcat*mealcat 3 3               1

[ duplicate output omitted ]

                                                                    Standard
   Parameter                                          Estimate        Error    t Value     Pr > |t|
   collcat 1&2 vs 3 at mealcat = 1                 -39.1317809     12.2043453     -3.21     0.0015
   collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3     82.5777567     24.4394069      3.38     0.0008

[ duplicate output omitted ]

Pretty slick huh? Of course, this contrast only makes sense if the collcat*mealcat interaction is statistically significant. Just to be clear, we are not using the cell-means model to assess the fit of the model nor to test each of the effects. We are using the cell-means model to make the coding of contrasts easier.

The order of categorical variables in the class statement decides which variable is the row variable and which is the column variable. If we switch the order of variables in the class statement, we will have to rewrite our estimate statement accordingly. Switching the order mealcat and colcat in the class statement has the effect of transposing our table of the values of the contrast, like this.

mealcat/collcat collcat = 1 collcat =2 collcat=3

mealcat = 1 -.5 -.5 1

mealcat = 2 0 0 0

mealcat = 3 0 0 0

Interestingly enough we can use exactly the same estimate statement as previous examples.

Now we can write our SAS code and view the output of the command.

proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat';
  class  mealcat collcat;
  model api00 = mealcat*collcat/noint ss3 solution;
  estimate 'collcat 1&2 vs 3 at mealcat = 1'
              collcat*mealcat -1 -1 2  0 0 0 0 0 0 /divisor=2 e;
estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3'
              collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e;
run;
quit;

                                        The GLM Procedure

                                    Class Level Information
                                 Class         Levels    Values
                                 mealcat            3    1 2 3
                                 collcat            3    1 2 3


                             Number of Observations Read         400
                             Number of Observations Used         400

                    Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1

                                                             Row 1
                               mealcat*collcat 1 1            -0.5
                               mealcat*collcat 1 2            -0.5
                               mealcat*collcat 1 3               1
                               mealcat*collcat 2 1               0
                               mealcat*collcat 2 2               0
                               mealcat*collcat 2 3               0
                               mealcat*collcat 3 1               0
                               mealcat*collcat 3 2               0
                               mealcat*collcat 3 3               0

 Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3

                                                             Row 1
                               mealcat*collcat 1 1               1
                               mealcat*collcat 1 2               0
                               mealcat*collcat 1 3              -1
                               mealcat*collcat 2 1               0
                               mealcat*collcat 2 2               0
                               mealcat*collcat 2 3               0
                               mealcat*collcat 3 1              -1
                               mealcat*collcat 3 2               0
                               mealcat*collcat 3 3               1

Dependent Variable: api00   api 2000

                                               Sum of
       Source                      DF         Squares     Mean Square    F Value    Pr > F
       Model                        9     174009675.8      19334408.4    4131.11    <.0001
       Error                      391       1829957.2          4680.2
       Uncorrected Total          400     175839633.0


                       R-Square     Coeff Var      Root MSE    api00 Mean
                       0.773343      10.56356      68.41197      647.6225


       Source                      DF     Type III SS     Mean Square    F Value    Pr > F
       mealcat*collcat              9     174009675.8      19334408.4    4131.11    <.0001


                                                                    Standard
   Parameter                                          Estimate        Error     t Value    Pr > |t|
   collcat 1&2 vs 3 at mealcat = 1                 -39.1317809     12.2043453     -3.21     0.0015
   collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3     82.5777567     24.4394069      3.38     0.0008

                                                       Standard
           Parameter                   Estimate           Error    t Value    Pr > |t|
           mealcat*collcat 1 1      816.9142857     11.56373322      70.64      <.0001
           mealcat*collcat 1 2      825.6511628     10.43272736      79.14      <.0001
           mealcat*collcat 1 3      782.1509434      9.39710655      83.23      <.0001
           mealcat*collcat 2 1      589.3500000     15.29738116      38.53      <.0001
           mealcat*collcat 2 2      636.6046512     10.43272736      61.02      <.0001
           mealcat*collcat 2 3      655.6376812      8.23583317      79.61      <.0001
           mealcat*collcat 3 1      493.9189189      7.95272978      62.11      <.0001
           mealcat*collcat 3 2      508.8333333      9.87441708      51.53      <.0001
           mealcat*collcat 3 3      541.7333333     17.66389427      30.67      <.0001

This exact same approach will work with mixed models. In fact, the code for the estimate statement is exactly the same for proc mixed as it was for proc glm. This example is demonstrated in our last example.

proc mixed data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat';
  class  mealcat collcat;
  model api00 = mealcat*collcat/noint solution ddfm=kr;
  estimate 'collcat 1&2 vs 3 at mealcat = 1'
              collcat*mealcat -1 -1 2  0 0 0 0 0 0 /divisor=2 e;
  estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3'
              collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e;
run;

                                       The Mixed Procedure

                                       Model Information
                     Data Set                     MYLIB.ELEMAPI2
                     Dependent Variable           api00
                     Covariance Structure         Diagonal
                     Estimation Method            REML
                     Residual Variance Method     Profile
                     Fixed Effects SE Method      Model-Based
                     Degrees of Freedom Method    Residual


                                     Class Level Information

                       Class      Levels    Values

                       mealcat         3    1 2 3
                       collcat         3    1 2 3


                                           Dimensions

                               Covariance Parameters             1
                               Columns in X                      9
                               Columns in Z                      0
                               Subjects                          1
                               Max Obs Per Subject             400


                                     Number of Observations

                           Number of Observations Read             400
                           Number of Observations Used             400
                           Number of Observations Not Used           0


                                      Covariance Parameter
                                            Estimates

                                      Cov Parm     Estimate
                                      Residual      4680.20


                                         Fit Statistics

                              -2 Res Log Likelihood          4447.1
                              AIC (smaller is better)        4449.1
                              AICC (smaller is better)       4449.2
                              BIC (smaller is better)        4453.1

                                   Solution for Fixed Effects

                    Percentage
                    free meals
                    in 3                                 Standard
 Effect             categories    collcat    Estimate       Error      DF    t Value    Pr > |t|
 mealcat*collcat    1             1            816.91     11.5637     391      70.64      <.0001
 mealcat*collcat    1             2            825.65     10.4327     391      79.14      <.0001
 mealcat*collcat    1             3            782.15      9.3971     391      83.23      <.0001
 mealcat*collcat    2             1            589.35     15.2974     391      38.53      <.0001
 mealcat*collcat    2             2            636.60     10.4327     391      61.02      <.0001
 mealcat*collcat    2             3            655.64      8.2358     391      79.61      <.0001
 mealcat*collcat    3             1            493.92      7.9527     391      62.11      <.0001
 mealcat*collcat    3             2            508.83      9.8744     391      51.53      <.0001
 mealcat*collcat    3             3            541.73     17.6639     391      30.67      <.0001


                                 Type 3 Tests of Fixed Effects

                                          Num     Den
                      Effect               DF      DF    F Value    Pr > F
                      mealcat*collcat       9     391    4131.11    <.0001


                        Coefficients for collcat 1&2 vs 3 at mealcat = 1

                                          Percentage
                                          free meals
                                          in 3
                       Effect             categories    collcat      Row1
                       mealcat*collcat    1             1            -0.5
                       mealcat*collcat    1             2            -0.5
                       mealcat*collcat    1             3               1
                       mealcat*collcat    2             1
                       mealcat*collcat    2             2
                       mealcat*collcat    2             3
                       mealcat*collcat    3             1
                       mealcat*collcat    3             2
                       mealcat*collcat    3             3

                                Coefficients for collcat 1v3 at
                                 mealcat=1 vs 1v3 at mealcat=3

                                          Percentage
                                          free meals
                                          in 3
                       Effect             categories    collcat      Row1

                       mealcat*collcat    1             1               1
                       mealcat*collcat    1             2
                       mealcat*collcat    1             3              -1
                       mealcat*collcat    2             1
                       mealcat*collcat    2             2
                       mealcat*collcat    2             3
                       mealcat*collcat    3             1              -1
                       mealcat*collcat    3             2
                       mealcat*collcat    3             3               1

                                           Estimates

                                                              Standard
   Label                                          Estimate      Error     DF   t Value    Pr > |t|
   collcat 1&2 vs 3 at mealcat = 1                -39.1318    12.2043    391     -3.21     0.0015
   collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3    82.5778    24.4394    391      3.38     0.0008

As you can see the estimate of the contrast, the t- and p-values are exactly the same as with proc glm