We will use a data set called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat to demonstrate. Variables mealcat and collcat are two categorical variables, both with three levels. The dependent variable is school’s API index. We want to look at a simple comparison to comparing the average of levels 1 & 2 vs. group 3 for collcat when mealcat is at level 1. Additionally, we want to see if the collcat difference between level 1 and 3 at mealcat equal 1 is different from the same comparison at mealcat equal 3. This second comparison is a bit trickier and we will discuss the values for the contrast a bit later on this page.
We can do this in proc glm using the estimate statement. Please note the e option in the estimate statements that displays the contrast we have defined.
proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat'; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ ss3; estimate 'collcat 1&2 vs 3 at mealcat = 1' collcat -.5 -.5 1 collcat*mealcat -.5 0 0 -.5 0 0 1 0 0 / e; estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3' collcat 0 0 0 collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e; run; quit;
Here is the output for the above command.
The GLM Procedure Class Level Information Class Levels Values collcat 3 1 2 3 mealcat 3 1 2 3 Number of Observations Read 400 Number of Observations Used 400 Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1 Row 1 Intercept 0 collcat 1 -0.5 collcat 2 -0.5 collcat 3 1 mealcat 1 0 mealcat 2 0 mealcat 3 0 collcat*mealcat 1 1 -0.5 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 -0.5 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 0 Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 Row 1 Intercept 0 collcat 1 0 collcat 2 0 collcat 3 0 mealcat 1 0 mealcat 2 0 mealcat 3 0 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 -1 collcat*mealcat 2 1 0 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 1 Dependent Variable: api00 api 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE api00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat 1&2 vs 3 at mealcat = 1 -39.1317809 12.2043453 -3.21 0.0015 collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 82.5777567 24.4394069 3.38 0.0008
The t-test for the first contrast was -3.21 with a p-value of .0015 while for the second contrast the t-value was 3.38 with a p-value of .0008. This all worked very nicely but seems a bit complicated. It might be easier if we adopt the cell-means approach. Here is the code and the output for a cell-means model without the estimate statement so that we can see how we would construct the contrast of interest. Please note the noint option on the model statement. This is needed so the we get all nine of the cell means. We wish to emphasize that the cell-means model is only used to obtain the cell means and should not be used to estimate model fit or statistical significance. The F-ratio shown below of 4,131.11 is somewhat misleading. It is jointly testing that all of the cell means equal zero. This hypothesis is neither informative nor interesting.
proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat'; class collcat mealcat; model api00 = collcat*mealcat/noint ss3 solution; run; quit; The GLM Procedure Class Level Information Class Levels Values collcat 3 1 2 3 mealcat 3 1 2 3 Number of Observations Read 400 Number of Observations Used 400 Dependent Variable: api00 api 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 9 174009675.8 19334408.4 4131.11 <.0001 Error 391 1829957.2 4680.2 Uncorrected Total 400 175839633.0 R-Square Coeff Var Root MSE api00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat*mealcat 9 174009675.8 19334408.4 4131.11 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat*mealcat 1 1 816.9142857 11.56373322 70.64 <.0001 collcat*mealcat 1 2 589.3500000 15.29738116 38.53 <.0001 collcat*mealcat 1 3 493.9189189 7.95272978 62.11 <.0001 collcat*mealcat 2 1 825.6511628 10.43272736 79.14 <.0001 collcat*mealcat 2 2 636.6046512 10.43272736 61.02 <.0001 collcat*mealcat 2 3 508.8333333 9.87441708 51.53 <.0001 collcat*mealcat 3 1 782.1509434 9.39710655 83.23 <.0001 collcat*mealcat 3 2 655.6376812 8.23583317 79.61 <.0001 collcat*mealcat 3 3 541.7333333 17.66389427 30.67 <.0001
You can see the nine cells going from collcat*mealcat 1 1 to collcat*mealcat 3 3. Here is the same information put into a 3x3 table.
collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3 collcat = 1 816.9142857 589.3500000 493.9189189 collcat =2 825.6511628 636.6046512 508.8333333 collcat = 3 782.1509434 655.6376812 541.7333333
Here are the values of the contrast needed for the two estimate statements in the same tabular form as the table above.
collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3 collcat = 1 -.5 0 0 collcat =2 -.5 0 0 collcat = 3 1 0 0
collcat /mealcat mealcat = 1 mealcat = 2 mealcat = 3 collcat = 1 1 0 -1 collcat =2 0 0 0 collcat = 3 -1 0 -1
The values for the second contrast were derived from the following,
(collcat*mealcat11-collcat*mealcat31)-(collcat*mealcat31-collcat*mealcat33).
Which can be simplified to
collcat*mealcat11-collcat*mealcat31-collcat*mealcat31+collcat*mealcat33.
The coefficients for each of these terms looks like this, 1 -1 -1 1. When you include all of the cell means that have zero coefficients, the values look like this, 1 0 -1 0 0 0 -1 0 1.
Now it should be easy to code the estimate statement. Below is the SAS code and the nonduplicate output from the above model. Note that we make use of the divisor = 2 option so that we don't have to write decimal fractions.
proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat'; class collcat mealcat; model api00 = collcat*mealcat/noint ss3 solution; estimate 'collcat 1&2 vs 3 at mealcat = 1' collcat*mealcat -1 0 0 -1 0 0 2 0 0 /divisor=2 e; estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3' collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e; run; quit; The GLM Procedure [ duplicate output omitted ] Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1 Row 1 collcat*mealcat 1 1 -0.5 collcat*mealcat 1 2 0 collcat*mealcat 1 3 0 collcat*mealcat 2 1 -0.5 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 0 Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 Row 1 collcat*mealcat 1 1 1 collcat*mealcat 1 2 0 collcat*mealcat 1 3 -1 collcat*mealcat 2 1 0 collcat*mealcat 2 2 0 collcat*mealcat 2 3 0 collcat*mealcat 3 1 -1 collcat*mealcat 3 2 0 collcat*mealcat 3 3 1 [ duplicate output omitted ] Standard Parameter Estimate Error t Value Pr > |t| collcat 1&2 vs 3 at mealcat = 1 -39.1317809 12.2043453 -3.21 0.0015 collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 82.5777567 24.4394069 3.38 0.0008 [ duplicate output omitted ]
Pretty slick huh? Of course, this contrast only makes sense if the collcat*mealcat interaction is statistically significant. Just to be clear, we are not using the cell-means model to assess the fit of the model nor to test each of the effects. We are using the cell-means model to make the coding of contrasts easier.
The order of categorical variables in the class statement decides which variable is the row variable and which is the column variable. If we switch the order of variables in the class statement, we will have to rewrite our estimate statement accordingly. Switching the order mealcat and colcat in the class statement has the effect of transposing our table of the values of the contrast, like this.
mealcat/collcat collcat = 1 collcat =2 collcat=3 mealcat = 1 -.5 -.5 1 mealcat = 2 0 0 0 mealcat = 3 0 0 0
Interestingly enough we can use exactly the same estimate statement as previous examples.
Now we can write our SAS code and view the output of the command.
proc glm data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat'; class mealcat collcat; model api00 = mealcat*collcat/noint ss3 solution; estimate 'collcat 1&2 vs 3 at mealcat = 1' collcat*mealcat -1 -1 2 0 0 0 0 0 0 /divisor=2 e; estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3' collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e; run; quit; The GLM Procedure Class Level Information Class Levels Values mealcat 3 1 2 3 collcat 3 1 2 3 Number of Observations Read 400 Number of Observations Used 400 Coefficients for Estimate collcat 1&2 vs 3 at mealcat = 1 Row 1 mealcat*collcat 1 1 -0.5 mealcat*collcat 1 2 -0.5 mealcat*collcat 1 3 1 mealcat*collcat 2 1 0 mealcat*collcat 2 2 0 mealcat*collcat 2 3 0 mealcat*collcat 3 1 0 mealcat*collcat 3 2 0 mealcat*collcat 3 3 0 Coefficients for Estimate collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 Row 1 mealcat*collcat 1 1 1 mealcat*collcat 1 2 0 mealcat*collcat 1 3 -1 mealcat*collcat 2 1 0 mealcat*collcat 2 2 0 mealcat*collcat 2 3 0 mealcat*collcat 3 1 -1 mealcat*collcat 3 2 0 mealcat*collcat 3 3 1 Dependent Variable: api00 api 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 9 174009675.8 19334408.4 4131.11 <.0001 Error 391 1829957.2 4680.2 Uncorrected Total 400 175839633.0 R-Square Coeff Var Root MSE api00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F mealcat*collcat 9 174009675.8 19334408.4 4131.11 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat 1&2 vs 3 at mealcat = 1 -39.1317809 12.2043453 -3.21 0.0015 collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 82.5777567 24.4394069 3.38 0.0008 Standard Parameter Estimate Error t Value Pr > |t| mealcat*collcat 1 1 816.9142857 11.56373322 70.64 <.0001 mealcat*collcat 1 2 825.6511628 10.43272736 79.14 <.0001 mealcat*collcat 1 3 782.1509434 9.39710655 83.23 <.0001 mealcat*collcat 2 1 589.3500000 15.29738116 38.53 <.0001 mealcat*collcat 2 2 636.6046512 10.43272736 61.02 <.0001 mealcat*collcat 2 3 655.6376812 8.23583317 79.61 <.0001 mealcat*collcat 3 1 493.9189189 7.95272978 62.11 <.0001 mealcat*collcat 3 2 508.8333333 9.87441708 51.53 <.0001 mealcat*collcat 3 3 541.7333333 17.66389427 30.67 <.0001
This exact same approach will work with mixed models. In fact, the code for the estimate statement is exactly the same for proc mixed as it was for proc glm. This example is demonstrated in our last example.
proc mixed data = 'D:/data/https://stats.idre.ucla.edu/wp-content/uploads/2016/02/elemapi2.sas7bdat'; class mealcat collcat; model api00 = mealcat*collcat/noint solution ddfm=kr; estimate 'collcat 1&2 vs 3 at mealcat = 1' collcat*mealcat -1 -1 2 0 0 0 0 0 0 /divisor=2 e; estimate 'collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3' collcat*mealcat 1 0 -1 0 0 0 -1 0 1 / e; run; The Mixed Procedure Model Information Data Set MYLIB.ELEMAPI2 Dependent Variable api00 Covariance Structure Diagonal Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Residual Class Level Information Class Levels Values mealcat 3 1 2 3 collcat 3 1 2 3 Dimensions Covariance Parameters 1 Columns in X 9 Columns in Z 0 Subjects 1 Max Obs Per Subject 400 Number of Observations Number of Observations Read 400 Number of Observations Used 400 Number of Observations Not Used 0 Covariance Parameter Estimates Cov Parm Estimate Residual 4680.20 Fit Statistics -2 Res Log Likelihood 4447.1 AIC (smaller is better) 4449.1 AICC (smaller is better) 4449.2 BIC (smaller is better) 4453.1 Solution for Fixed Effects Percentage free meals in 3 Standard Effect categories collcat Estimate Error DF t Value Pr > |t| mealcat*collcat 1 1 816.91 11.5637 391 70.64 <.0001 mealcat*collcat 1 2 825.65 10.4327 391 79.14 <.0001 mealcat*collcat 1 3 782.15 9.3971 391 83.23 <.0001 mealcat*collcat 2 1 589.35 15.2974 391 38.53 <.0001 mealcat*collcat 2 2 636.60 10.4327 391 61.02 <.0001 mealcat*collcat 2 3 655.64 8.2358 391 79.61 <.0001 mealcat*collcat 3 1 493.92 7.9527 391 62.11 <.0001 mealcat*collcat 3 2 508.83 9.8744 391 51.53 <.0001 mealcat*collcat 3 3 541.73 17.6639 391 30.67 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F mealcat*collcat 9 391 4131.11 <.0001 Coefficients for collcat 1&2 vs 3 at mealcat = 1 Percentage free meals in 3 Effect categories collcat Row1 mealcat*collcat 1 1 -0.5 mealcat*collcat 1 2 -0.5 mealcat*collcat 1 3 1 mealcat*collcat 2 1 mealcat*collcat 2 2 mealcat*collcat 2 3 mealcat*collcat 3 1 mealcat*collcat 3 2 mealcat*collcat 3 3 Coefficients for collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 Percentage free meals in 3 Effect categories collcat Row1 mealcat*collcat 1 1 1 mealcat*collcat 1 2 mealcat*collcat 1 3 -1 mealcat*collcat 2 1 mealcat*collcat 2 2 mealcat*collcat 2 3 mealcat*collcat 3 1 -1 mealcat*collcat 3 2 mealcat*collcat 3 3 1 Estimates Standard Label Estimate Error DF t Value Pr > |t| collcat 1&2 vs 3 at mealcat = 1 -39.1318 12.2043 391 -3.21 0.0015 collcat 1v3 at mealcat=1 vs 1v3 at mealcat=3 82.5778 24.4394 391 3.38 0.0008
As you can see the estimate of the contrast, the t- and p-values are exactly the same as with proc glm