We will use the hsb2 dataset and start with a logistic regression model predicting the binary outcome variable hiread with the variables write and ses. The variable write is continuous, and the variable ses is categorical with three categories (1 = low, 2 = middle, 3 = high). In the code below, the class statement is used to specify that ses is a categorical variable and should be treated as such.
data hsb2m; set "D:datahsb2"; hiread = (read>=52.23); run;proc logistic data = hsb2m descending; class ses; model hiread = write ses ; run ;
The "Class Level Information" section of the SAS output shows the coding used by SAS in estimating the model. This coding scheme is what is known as effect coding. (For more information see our FAQ page What is effect coding?)
Class Level Information Design Class Value Variables SES 1 1 0 2 0 1 3 -1 -1
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -8.1220 1.3216 37.7697 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 SES 1 1 -0.4856 0.2823 2.9594 0.0854 SES 2 1 0.0508 0.2290 0.0493 0.8243
Further down in the output, we find the table containing the rest to the estimates of the coefficients. For the variable ses there are two coefficients one for each of the effect-coded variables in the model (ses 1 and ses 2). The coefficients are -0.4856 and 0.0508. If we exponentiate these coefficients we get exp(-0.4856) = .61533 and exp(0.0508) = 1.0521, for ses 1 and ses 2 respectively, but the odds ratios in listed in the table with the heading "Odds Ratio Estimates" are 0.398 and 0.681. Why aren't the odds ratios consistent with the coefficients? The answer is that SAS uses effect coding for the coefficients, but uses dummy variable coding when calculating the odds ratios. Because they are not making the same comparisons, it is possible for the coefficients in the table of estimates to be non-significant while the confidence interval around the odds ratios does not include one (or vice versa). (For more information see our FAQ What is dummy coding?)
Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 SES 1 vs 3 0.398 0.153 1.040 SES 2 vs 3 0.681 0.313 1.485
If we run the same analysis, but use dummy variable coding for both the parameter estimates and the odds ratios, we can get coefficients that will be consistent with the odds ratios. There are several methods that can be used to estimate a model using dummy coding for nominal level variables. In the first example below we add (ref='3') / param = ref to the class statement. This instructs SAS that for the variable ses the desired reference category is 3 (we could also use category 1 or 2 as the reference category), and then tells SAS that we want to use the reference coding scheme in parameter estimates.
proc logistic data = hsb2m descending; class ses (ref='3') / param = ref ; model hiread = write ses ; run ;
Looking at the output (below), the coding system shown in the "Class Level Information" section of the output is for two dummy variables, one for category 1 versus 3, and one for category 2 versus 3. Note two other things in the output below. First, that the coefficients in this model are consistent with the odds ratios. That is, exp(-0.9204) = 0.398 and exp(-0.3839) = 0.681. The second thing to notice is that the odds ratios from this model are the same as the odds ratios above. This is expected, since, SAS always uses dummy coding to compute odds ratios, all that has changed is how the categorical variable ses is being parameterized in the part of parameter estimates.
Class Level Information Design Class Value Variables SES 1 1 0 2 0 1 3 0 0
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -7.6872 1.3697 31.4984 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 SES 1 1 -0.9204 0.4897 3.5328 0.0602 SES 2 1 -0.3839 0.3975 0.9330 0.3341
Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 SES 1 vs 3 0.398 0.153 1.040 SES 2 vs 3 0.681 0.313 1.485
Another way to use dummy coding is to create the dummy variables manually and use them on the model statement, bypassing the class statement entirely. The code below does this. First we create two dummy variables, ses_d1 and ses_d2, which code for category 1 versus 3, and category 2 versus 3 respectively. Then we include ses_d1 and ses_d2 in the model statement. There is no need for the class statement here. The output generated by this code will not include the "Class Level Information" since the class statement was not used and the output will be otherwise identical to the last model.
data hsb2ms; set hsb2m; if ses = 1 then ses_d1 = 1; if ses = 2 then ses_d1 = 0; if ses = 3 then ses_d1 = 0; if ses = 1 then ses_d2 = 0; if ses = 2 then ses_d2 = 1; if ses = 3 then ses_d2 = 0; run; proc logistic data = hsb2ms descending; model hiread = write ses_d1 ses_d2 ; run ;
As a final exercise, we can run the model using effect coding and check to see that the coefficients from this model match the coefficients from the first model. This will confirm that SAS is in fact using effect coding in the first model. The first step is to create the variables for the effect coding, below we have called them ses_e1 and ses_e2, for the coefficients for the differences between category 1 and the grand mean (when all other covariates equal zero), and category 2 and the grand mean, respectively. Then we run the model with ses_e1 and ses_e2 on the model statement, and the class statement is omitted entirely (since we have done the work normally done by the class statement).
data hsb2ms1; set hsb2ms; if ses = 1 then ses_e1 = 1; if ses = 2 then ses_e1 = 0; if ses = 3 then ses_e1 = -1; if ses = 1 then ses_e2 = 0; if ses = 2 then ses_e2 = 1; if ses = 3 then ses_e2 = -1; run; proc logistic data = hsb2ms1 descending; model hiread = write ses_e1 ses_e2; run ;
Comparing the table of coefficients below to the coefficients in the very first table of estimates, we see that the coefficients are in fact the same. This confirms that the model in the first table was estimated using effect coding, by default. Note that the odds ratios below do not match the odds ratios in the first model, because when we use the class statement, SAS uses dummy coding to generate the odds ratios, while in this case, the odds ratios are computed directly from the estimated coefficients.
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -8.1220 1.3216 37.7697 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 ses_e1 1 -0.4856 0.2823 2.9594 0.0854 ses_e2 1 0.0508 0.2290 0.0493 0.8243
Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 ses_e1 0.615 0.354 1.070 ses_e2 1.052 0.672 1.648