In PROC LOGISTIC why aren’t the coefficients consistent with the odds ratios?

We will use the hsb2 dataset and start with a logistic regression model predicting the binary outcome variable hiread with the variables write and ses. The variable write is continuous, and the variable ses is categorical with three categories (1 = low, 2 = middle, 3 = high). In the code below, the class statement is used to specify that ses is a categorical variable and should be treated as such.

data hsb2m;
set "D:datahsb2";
hiread = (read>=52.23);
run;

proc logistic data = hsb2m descending;
class ses;
model hiread = write ses ;
run ;

The "Class Level Information" section of the SAS output shows the coding used by SAS in estimating the model. This coding scheme is what is known as effect coding. (For more information see our FAQ page What is effect coding?)

   Class Level Information

                      Design
Class     Value     Variables

SES       1          1      0
          2          0      1
          3         -1     -1

         Analysis of Maximum Likelihood Estimates

                                              Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -8.1220      1.3216       37.7697        <.0001
WRITE           1      0.1438      0.0236       37.0981        <.0001
SES       1     1     -0.4856      0.2823        2.9594        0.0854
SES       2     1      0.0508      0.2290        0.0493        0.8243

Further down in the output, we find the table containing the rest to the estimates of the coefficients. For the variable ses there are two coefficients one for each of the effect-coded variables in the model (ses 1 and ses 2). The coefficients are -0.4856 and 0.0508. If we exponentiate these coefficients we get exp(-0.4856) = .61533 and exp(0.0508) = 1.0521, for ses 1 and ses 2 respectively, but the odds ratios in listed in the table with the heading "Odds Ratio Estimates" are 0.398 and 0.681. Why aren't the odds ratios consistent with the coefficients? The answer is that SAS uses effect coding for the coefficients, but uses dummy variable coding when calculating the odds ratios. Because they are not making the same comparisons, it is possible for the coefficients in the table of estimates to be non-significant while the confidence interval around the odds ratios does not include one (or vice versa). (For more information see our FAQ What is dummy coding?)

              Odds Ratio Estimates

                   Point          95% Wald
   Effect          Estimate      Confidence Limits

WRITE              1.155       1.102       1.209
SES   1 vs 3       0.398       0.153       1.040
SES   2 vs 3       0.681       0.313       1.485

If we run the same analysis, but use dummy variable coding for both the parameter estimates and the odds ratios, we can get coefficients that will be consistent with the odds ratios. There are several methods that can be used to estimate a model using dummy coding for nominal level variables. In the first example below we add (ref='3') / param = ref to the class statement. This instructs SAS that for the variable ses the desired reference category is 3 (we could also use category 1 or 2 as the reference category), and then tells SAS that we want to use the reference coding scheme in parameter estimates.

proc logistic data = hsb2m descending;
class ses (ref='3') / param = ref ;
model hiread = write ses ;
run ;

Looking at the output (below), the coding system shown in the "Class Level Information" section of the output is for two dummy variables, one for category 1 versus 3, and one for category 2 versus 3. Note two other things in the output below. First, that the coefficients in this model are consistent with the odds ratios. That is, exp(-0.9204) = 0.398 and exp(-0.3839) = 0.681. The second thing to notice is that the odds ratios from this model are the same as the odds ratios above. This is expected, since, SAS always uses dummy coding to compute odds ratios, all that has changed is how the categorical variable ses is being parameterized in the part of parameter estimates.

   Class Level Information

                      Design
Class     Value     Variables

SES       1          1      0
          2          0      1
          3          0      0

              Analysis of Maximum Likelihood Estimates

                                 Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -7.6872      1.3697       31.4984        <.0001
WRITE           1      0.1438      0.0236       37.0981        <.0001
SES       1     1     -0.9204      0.4897        3.5328        0.0602
SES       2     1     -0.3839      0.3975        0.9330        0.3341

              Odds Ratio Estimates

                   Point          95% Wald
Effect          Estimate      Confidence Limits

WRITE              1.155       1.102       1.209
SES   1 vs 3       0.398       0.153       1.040
SES   2 vs 3       0.681       0.313       1.485

Another way to use dummy coding is to create the dummy variables manually and use them on the model statement, bypassing the class statement entirely. The code below does this. First we create two dummy variables, ses_d1 and ses_d2, which code for category 1 versus 3, and category 2 versus 3 respectively. Then we include ses_d1 and ses_d2 in the model statement. There is no need for the class statement here. The output generated by this code will not include the "Class Level Information" since the class statement was not used and the output will be otherwise identical to the last model.

data hsb2ms;
set  hsb2m;
if ses = 1 then ses_d1 = 1;
if ses = 2 then ses_d1 = 0;
if ses = 3 then ses_d1 = 0;

if ses = 1 then ses_d2 = 0;
if ses = 2 then ses_d2 = 1;
if ses = 3 then ses_d2 = 0;
run;

proc logistic data = hsb2ms descending;
model hiread = write ses_d1 ses_d2 ;
run ;

As a final exercise, we can run the model using effect coding and check to see that the coefficients from this model match the coefficients from the first model. This will confirm that SAS is in fact using effect coding in the first model. The first step is to create the variables for the effect coding, below we have called them ses_e1 and ses_e2, for the coefficients for the differences between category 1 and the grand mean (when all other covariates equal zero), and category 2 and the grand mean, respectively. Then we run the model with ses_e1 and ses_e2 on the model statement, and the class statement is omitted entirely (since we have done the work normally done by the class statement).

data hsb2ms1;
set  hsb2ms;
if ses = 1 then ses_e1 = 1;
if ses = 2 then ses_e1 = 0;
if ses = 3 then ses_e1 = -1;

if ses = 1 then ses_e2 = 0;
if ses = 2 then ses_e2 = 1;
if ses = 3 then ses_e2 = -1;
run;
	
proc logistic data = hsb2ms1 descending;
model hiread = write ses_e1 ses_e2;
run ;

Comparing the table of coefficients below to the coefficients in the very first table of estimates, we see that the coefficients are in fact the same. This confirms that the model in the first table was estimated using effect coding, by default. Note that the odds ratios below do not match the odds ratios in the first model, because when we use the class statement, SAS uses dummy coding to generate the odds ratios, while in this case, the odds ratios are computed directly from the estimated coefficients.

            Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -8.1220      1.3216       37.7697        <.0001
WRITE         1      0.1438      0.0236       37.0981        <.0001
ses_e1        1     -0.4856      0.2823        2.9594        0.0854
ses_e2        1      0.0508      0.2290        0.0493        0.8243

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

WRITE        1.155       1.102       1.209
ses_e1       0.615       0.354       1.070
ses_e2       1.052       0.672       1.648