We will use the hsb2 dataset and start with a logistic regression model predicting the binary outcome
variable **hiread**
with the variables **write** and **ses**. The variable **write** is
continuous, and the
variable **ses** is categorical with three categories (1 = low, 2 = middle, 3 = high).
In the code below, the class statement is used to specify that **ses** is a categorical variable
and should be treated as such.

data hsb2m; set "D:datahsb2"; hiread = (read>=52.23); run;proc logistic data = hsb2m descending; class ses; model hiread = write ses ; run ;

The "Class Level Information" section of the SAS output shows the coding used by SAS in estimating the model. This coding scheme is what is known as effect coding. (For more information see our FAQ page What is effect coding?)

Class Level Information Design Class Value Variables SES 1 1 0 2 0 1 3 -1 -1

Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -8.1220 1.3216 37.7697 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 SES 1 1 -0.4856 0.2823 2.9594 0.0854 SES 2 1 0.0508 0.2290 0.0493 0.8243

Further down in the output, we find the table containing the rest to the
estimates of the coefficients. For the variable **ses** there are two coefficients one for
each of the effect-coded variables in the model (**ses** 1 and **ses** 2). The coefficients are -0.4856 and 0.0508.
If we exponentiate these coefficients we get exp(-0.4856) = .61533 and exp(0.0508) = 1.0521,
for **ses** 1 and **ses** 2 respectively, but the odds ratios in listed in the table
with the heading "Odds Ratio Estimates" are 0.398 and 0.681. Why aren't the odds ratios consistent with the
coefficients? The answer is that SAS uses effect coding for the coefficients, but
uses dummy
variable coding when calculating the odds ratios. Because they are not making
the same comparisons, it is possible for the coefficients in the table of
estimates to be non-significant while the confidence interval around the odds
ratios does not include one (or vice versa). (For more information see our FAQ
What is
dummy coding?)

Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 SES 1 vs 3 0.398 0.153 1.040 SES 2 vs 3 0.681 0.313 1.485

If we run the same analysis, but use dummy
variable coding for both the parameter estimates and the odds ratios, we can get
coefficients that will be consistent with the odds ratios. There are several methods
that can be used to estimate a model using dummy coding for nominal level
variables. In the first
example below we add **(ref='3') / param = ref** to the **class** statement. This
instructs SAS that for the variable **ses** the desired reference category is 3
(we could also use category 1 or 2 as the reference category), and then tells SAS that we want to use
the reference
coding scheme in parameter estimates.

proc logistic data = hsb2m descending; class ses (ref='3') / param = ref ; model hiread = write ses ; run ;

Looking at the output (below), the coding system shown in the "Class Level
Information" section of the output is for two dummy variables, one for category
1 versus 3, and one for category 2 versus 3. Note two other things in the output
below. First, that the coefficients in this model are consistent
with the odds ratios. That is, exp(-0.9204) = 0.398 and exp(-0.3839) =
0.681. The second thing to notice is that the odds ratios from this model are
the same as the odds ratios above.
This is expected, since, SAS always uses dummy coding to compute odds ratios,
all that has changed is how the categorical variable **ses** is being
parameterized in the part of parameter estimates.

Class Level Information Design Class Value Variables SES 1 1 0 2 0 1 3 0 0

Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -7.6872 1.3697 31.4984 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 SES 1 1 -0.9204 0.4897 3.5328 0.0602 SES 2 1 -0.3839 0.3975 0.9330 0.3341

Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 SES 1 vs 3 0.398 0.153 1.040 SES 2 vs 3 0.681 0.313 1.485

Another way to use dummy coding is to create the dummy variables manually and use them
on the **model** statement,
bypassing the **class** statement entirely. The code below does this. First we create two dummy
variables, **ses_d1** and **ses_d2**, which code for category 1 versus 3, and category 2
versus 3 respectively. Then we include **ses_d1** and **ses_d2** in the
model statement. There is no need for the **class** statement here. The output generated by this code will not include the
"Class Level Information" since the **class** statement was not used
and the
output will be otherwise identical to the last model.

data hsb2ms; set hsb2m; if ses = 1 then ses_d1 = 1; if ses = 2 then ses_d1 = 0; if ses = 3 then ses_d1 = 0; if ses = 1 then ses_d2 = 0; if ses = 2 then ses_d2 = 1; if ses = 3 then ses_d2 = 0; run; proc logistic data = hsb2ms descending; model hiread = write ses_d1 ses_d2 ; run ;

As a final exercise, we can run the model using effect coding and check to see that the coefficients
from this model match the coefficients from the first model. This will confirm that SAS is in fact using
effect coding in the first model. The first step is to create the variables for the effect coding, below
we have called them **ses_e1** and **ses_e2**, for the coefficients for the differences
between category 1 and the grand mean (when all other covariates equal zero),
and category 2 and the grand mean, respectively. Then
we run the model with **ses_e1** and **ses_e2** on the **model** statement, and the
**class**
statement is omitted entirely (since we have done the work normally done by the
**class** statement).

data hsb2ms1; set hsb2ms; if ses = 1 then ses_e1 = 1; if ses = 2 then ses_e1 = 0; if ses = 3 then ses_e1 = -1; if ses = 1 then ses_e2 = 0; if ses = 2 then ses_e2 = 1; if ses = 3 then ses_e2 = -1; run; proc logistic data = hsb2ms1 descending; model hiread = write ses_e1 ses_e2; run ;

Comparing the table of coefficients below to the coefficients in the very first table
of estimates, we see that the coefficients are in fact the same. This confirms that
the model in the first table was estimated using effect coding, by default. Note that the odds ratios below do not match the odds
ratios in the first model, because when we use the **class** statement, SAS uses dummy coding
to generate the odds ratios, while in this case, the odds ratios are computed directly from the
estimated coefficients.

Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -8.1220 1.3216 37.7697 <.0001 WRITE 1 0.1438 0.0236 37.0981 <.0001 ses_e1 1 -0.4856 0.2823 2.9594 0.0854 ses_e2 1 0.0508 0.2290 0.0493 0.8243

Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits WRITE 1.155 1.102 1.209 ses_e1 0.615 0.354 1.070 ses_e2 1.052 0.672 1.648