Suppose we have an ANOVA model, and we would like to compare means between one group
and
another group. This is commonly done with the **estimate** statement in
SAS.
Let’s look at a simple example using a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat
. The dependent variable is the school’s API index (a continuous variable). The variables **mealcat** and **collcat**
are categorical variables, both with three levels. These will be used as
the predictor variables. This model is shown below. We have included
the **lsmeans** statement to get the expected means for each group.
This can be helpful if you want to calculate the contrast estimate by hand.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; lsmeans collcat*mealcat; run; quit;Least Squares Means collcat mealcat api00 LSMEAN 1 1 816.914286 1 2 589.350000 1 3 493.918919 2 1 825.651163 2 2 636.604651 2 3 508.833333 3 1 782.150943 3 2 655.637681 3 3 541.733333

Let’s say that we want to look at a simple comparison of group 1
versus 2 and above of **collcat** when **mealcat** = 1. One way of doing this
using **proc glm** with **estimate** statement. We use the **e**
option on the **estimate** statement to have SAS print out the contrast
coefficients that are applied to each group.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 / e; run; quit;

While this **estimate** statement will run the analysis we want, it is a
little difficult to write. Another way of accomplishing the same thing is to use a cell means model. A cell means model estimates only
one parameter for each cell and sets the intercept to 0. In general, the cell means model is
not used to produce an overall test of model fit, but it is often used
to write simpler **estimate** or **contrast** statements. So, in practice, we need to
write the **proc glm** code twice, once for the model fit and the second time for the
estimates or contrasts. In the code shown below, the first **proc glm **is
for model fit and the second one with the **estimate** statement is used to estimate
the simple comparison. We use the **ss3** option to limit the output to
only the Type III sums of squares.

In the second call to **proc glm**, which is a cell means model, the main effects
are omitted; only the interaction is included in the model. We use the **noint** option on the
**model**
statement
to specify that we are not going to estimate the intercept; therefore, we will
estimate one parameter per cell. We use the **e** option to show us the
contrast codes that were used. This is a useful way to be sure that the
contrast codings were assigned as you intended.

* estimating the overall model; proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; run; quit;* estimating the cell means model to get the desired estimate; proc glm data = elemapi2; class collcat mealcat; model api00 = collcat*mealcat/noint; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 / e; run; quit;

Notice that the order of categorical variables on the **class** statement decides which variable is the row variable and which is the
column variable. For example, in the code above, **collcat** will be the row
variable and **mealcat** will be the column variable. Therefore, the simple
comparison we are interested in can be formulated as shown in the following table. Writing
the numbers in the table one row at a time, we can write our **estimate** statement
as:

estimate 'simple comparison' collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 / e;

collcat /mealcat | mealcat = 1 | mealcat = 2 | mealcat = 3 |

collcat = 1 | 1 | 0 | 0 |

collcat =2 | -.5 | 0 | 0 |

collcat = 3 | -.5 | 0 | 0 |

Equivalently, we can make use of the option **divisor =** to rewrite the statement in terms of whole numbers as shown
below.

estimate 'simple comparison' collcat*mealcat 2 0 0 -1 0 0 -1 0 0 /divisor=2 e;

If we switch the order of variables on the **class**
statement, we will have to rewrite our **estimate** statement accordingly. For example, we can rewrite the above
**estimate** statement such as the
following; it produces exactly the same result from the **estimate**
statement, since the corresponding table is simply being transposed.

mealcat /collcat |
collcat = 1 | collcat =2 | collcat=3 |

mealcat = 1 | 1 | -.5 | -.5 |

mealcat = 2 | 0 | 0 | 0 |

mealcat = 3 | 0 | 0 | 0 |

Let’s run the analysis model. Notice that both **collcat** and the
**collcat*mealcat** interaction need to be specified on the **estimate**
statement.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0 / e; run; quit;The GLM Procedure Class Level Information Class Levels Values mealcat 3 1 2 3 collcat 3 1 2 3 Number of Observations Read 400 Number of Observations Used 400Coefficients for Estimate collcat 1 vs 2+ within mealcat = 1 Row 1 Intercept 0 collcat 1 1 collcat 2 -0.5 collcat 3 -0.5 mealcat 1 0 mealcat 2 0 mealcat 3 0 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 -0.5 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -0.5 collcat*mealcat 3 2 0 collcat*mealcat 3 3 0Dependent Variable: api00 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE api00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat 1 vs 2+ within mealcat = 1 13.0132326 13.5279998 0.96 0.3367

This simple comparison is not statistically significant (t = 0.96, p = 0.3367).

Now let’s run the cell means model. Notice that only the interaction term
is used on the **model** and the **estimate** statements.

proc glm data = elemapi2; class mealcat collcat; model api00 = mealcat*collcat/noint ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat*mealcat 1 -.5 -.5 0 0 0 0 0 0 /e; run; quit;<some output omitted>Coefficients for Estimate collcat 1 vs 2+ within mealcat = 1 Row 1 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 -0.5 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -0.5 collcat*mealcat 3 2 0 collcat*mealcat 3 3 0<some output omitted>Standard Parameter Estimate Error t Value Pr > |t| collcat 1 vs 2+ within mealcat = 1 13.0132326 13.5279998 0.96 0.3367

## Brief summary

- A cell means model is used only for the purpose of making the contrast
on an
**estimate**statement easier to write. The only part of the output that is considered is the part related to the contrast estimate (this is usually found at the bottom of the output). - Writing the
**estimate**statement with a cell means model is easier because it includes only one vector (for highest-order interaction). Using the analysis model, the**estimate**statement for the same contrast may contain multiple vectors and/or matrices and is therefore more difficult to specify correctly. - The cell means model approach can be used with models that include
three-way or higher interactions. Only the highest-order interaction
is included on the
**model**statement, and only this term is used on the**estimate**statement. - A cell means model can be used with other procedures, such as
**proc mixed**. - The technique can also be used with the
**contrast**statement. - You may notice a note in the SAS log file that reads:

Due to the presence of CLASS variables, an intercept is implicitly fitted. R-Square has been corrected for the mean.

This note may at first seem confusing, because we specified the **noint**
option on the **model** statement. However, instead of estimating an
intercept, we are estimating the mean for one of the groups in our model; hence,
the same number of parameters are being estimated. (In the full model,
there are eight parameters plus the intercept, for a total of nine parameters;
in the cell means model, nine parameters are estimated.) The R-square is
the same for the two models (R-square = .773343), which we would expect.

## Example 2

In our second example, we will compare the means between two cells in the design. Specifically, we will compare collcat=2 at mealcat=1 to collcat=3 at mealcat=2.

collcat /mealcat | mealcat = 1 | mealcat = 2 | mealcat = 3 |

collcat = 1 | 0 | 0 | 0 |

collcat =2 | 1 | 0 | 0 |

collcat = 3 | 0 | -1 | 0 |

The **estimate** statement for this comparison is shown below.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 2/mealcat 1 vs. collcat 3/mealcat 2' collcat 0 1 -1 mealcat 1 -1 0 collcat*mealcat 0 0 0 1 0 0 0 -1 0 /e; run; quit;<some output omitted>Coefficients for Estimate collcat 2/mealcat 1 vs. collcat 3/mealcat 2 Row 1 Intercept 0 collcat 1 0 collcat 2 1 collcat 3 -1 mealcat 1 1 mealcat 2 -1 mealcat 3 0 collcat*mealcat 1 1 0 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 1 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 0 collcat*mealcat 3 2 -1 collcat*mealcat 3 3 0<some output omitted>Standard Parameter Estimate Error t Value Pr > |t| collcat 2/mealcat 1 vs. collcat 3/mealcat 2 170.013482 13.2917549 12.79 <.0001

This comparison is statistically significant (t = 12.79, p < .0001).

Using the cell means model, the **estimate** statement is constructed as
shown below.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat*mealcat/noint; estimate 'collcat 2/mealcat 1 vs. collcat 3/mealcat 2' collcat*mealcat 0 0 0 1 0 0 0 -1 0 /e; run; quit;<some output omitted>Coefficients for Estimate collcat 2/mealcat 1 vs. collcat 3/mealcat 2 Row 1 collcat*mealcat 1 1 0 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 1 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 0 collcat*mealcat 3 2 -1 collcat*mealcat 3 3 0<some output omitted>Standard Parameter Estimate Error t Value Pr > |t| collcat 2/mealcat 1 vs. collcat 3/mealcat 2 170.013482 13.2917549 12.79 <.0001

## Example 3

In our last example, we will look at a difference in differences. We will take the difference between the difference of ([collcat=1 and mealcat=1] and [collcat=1 and mealcat=3]) and ([collcat=3 and mealcat=1] and [collcat=3 and mealcat=3]).

Remember that a little bit of math needs to be done to get the correct signs of the contrast coefficients: (collcat=1/mealcat=1 – collcat=1/mealcat=3) – (collcat=3/mealcat=1 – collcat=3/mealcat=3) = collcat=1/mealcat=1 – collcat=1/mealcat=3 – collcat=3/mealcat=1 + collcat=3/mealcat=3.

collcat /mealcat | mealcat = 1 | mealcat = 2 | mealcat = 3 |

collcat = 1 | 1 | 0 | -1 |

collcat =2 | 0 | 0 | 0 |

collcat = 3 | -1 | 0 | 1 |

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'differences in differences' collcat*mealcat 1 0 -1 0 0 0 -1 0 1 /e; run; quit;<some output omitted>Coefficients for Estimate differences in differences Row 1 Intercept 0 collcat 1 0 collcat 2 0 collcat 3 0 mealcat 1 0 mealcat 2 0 mealcat 3 0 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 -1 collcat*mealcat 2 1 0 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 1<some output omitted>Standard Parameter Estimate Error t Value Pr > |t| differences in differences 82.5777567 24.4394069 3.38 0.0008

The comparison is statistically significant (t = 3.38, p = .0008).

Here is the **estimate** statement using the cell means model.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat*mealcat/noint; estimate 'differences in differences' collcat*mealcat 1 0 -1 0 0 0 -1 0 1 /e; run; quit;<some output omitted>Coefficients for Estimate differences in differences Row 1 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 -1 collcat*mealcat 2 1 0 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 1<some output omitted>Standard Parameter Estimate Error t Value Pr > |t| differences in differences 82.5777567 24.4394069 3.38 0.0008