FAQ: How do I interpret the coefficients in an ordinal logistic regression?

The interpretation of coefficients in an ordinal logistic regression varies by the software you use. In this FAQ page, we will focus on the interpretation of the coefficients in Stata and R, but the results generalize to SPSS and Mplus. The parameterization in SAS is different from the others.

Definitions

First let’s establish some notation and review the concepts involved in ordinal logistic regression. Let $Y$ be an ordinal outcome with $J$ categories. Then $P(Y \le j)$ is the cumulative probability of $Y$ less than or equal to a specific category $j = 1, \cdots, J-1$. Note that $P(Y \le J) =1.$ The odds of being less than or equal a particular category can be defined as

$$\frac{P(Y \le j)}{P(Y>j)}$$

for $j=1,\cdots, J-1$ since $P(Y > J) = 0$ and dividing by zero is undefined. Alternatively, you can write $P(Y >j) = 1 – P(Y \le j)$. The log odds is also known as the logit, so that

$$log \frac{P(Y \le j)}{P(Y>j)} = logit (P(Y \le j)).$$

Ordinal Logistic Regression Model

The ordinal logistic regression model can be defined as

$$logit (P(Y \le j)) = \beta_{j0} + \beta_{j1}x_1 + \cdots + \beta_{jp} x_p$$ for $j=1, \cdots, J-1$ and $p$ predictors. Due to the parallel lines assumption, the intercepts are different for each category but the slopes are constant across categories, which simplifies the equation above to

$$logit (P(Y \le j)) = \beta_{j0} + \beta_{1}x_1 + \cdots + \beta_{p} x_p.$$

How Stata and R parameterizes the ordinal regression model

In Stata and R (polr) the ordinal logistic regression model is parameterized as

$$logit (P(Y \le j)) = \beta_{j0} – \eta_{1}x_1 – \cdots – \eta_{p} x_p$$

where $\eta_i = -\beta_i.$

Suppose we want to see whether a binary predictor parental education (pared) predicts an ordinal outcome of students who are unlikely, somewhat likely and very likely to apply to a college (apply).

Due to the parallel lines assumption, even though we have three categories, the coefficient of parental education (pared) stays the same across the two categories. The the two equations for pared = 1 and pared = 0 are

$$ \begin{eqnarray} logit (P(Y \le j | x_1=1) & = & \beta_{j0} – \eta_{1} \\ logit (P(Y \le j | x_1=0) & = & \beta_{j0} \end{eqnarray} $$

Then $logit (P(Y \le j)|x_1=1) -logit (P(Y \le j)|x_1=0) = – \eta_{1}.$

Stata

To run an ordinal logistic regression in Stata, first import the data and then use the ologit command.

use "https://stats.idre.ucla.edu/stat/data/ologit.dta", clear
ologit apply i.pared

<... omitted output...> 
Ordered logistic regression                     Number of obs     =        400
                                                LR chi2(1)        =      18.41
                                                Prob > chi2       =     0.0000
Log likelihood = -361.39515                     Pseudo R2         =     0.0248

------------------------------------------------------------------------------
       apply |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.pared |   1.127491   .2634324     4.28   0.000      .611173    1.643809
-------------+----------------------------------------------------------------
       /cut1 |   .3768424   .1103421                      .1605758     .593109
       /cut2 |   2.451855   .1825628                      2.094039    2.809672
------------------------------------------------------------------------------

R

Running the same analysis in R requires some more steps. First load the following libraries:

library(foreign)
library(MASS)

Now read in the data and run the analysis using polr:

dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")
m <- polr(apply ~ pared, data = dat)
summary(m)

The shortened output looks like the following:

Coefficients:
      Value Std. Error t value
pared 1.127     0.2634    4.28

Intercepts:
                            Value   Std. Error t value
unlikely|somewhat likely     0.3768  0.1103     3.4152
somewhat likely|very likely  2.4519  0.1826    13.4302

The output shows that for students whose parents attended college, the log odds of being unlikely to apply to college (versus somewhat or very likely) is actually $-\hat{\eta}_1=-1.13$ or $1.13$ points lower than students whose parents did not attend college. Recall that $-\eta_i = \beta_i$ for $j=1,2$ only since $logit (P(Y \le 3))$ is undefined. So the formulations for the first and second category becomes:

$$ \begin{eqnarray} logit (P(Y \le 1)) & = & 0.377 – 1.13 x_1 \\ logit (P(Y \le 2)) & = & 2.45 – 1.13 x_1 \\ \end{eqnarray} $$

To see the connection between the parallel lines assumption and the proportional odds assumption, exponentiate both sides of the equations above and use the property that $log(b)-log(a) = log(b/a)$ to calculate the odds of pared for each level of apply.

$$ \begin{eqnarray} \frac{P(Y \le 1 | x_1=1)}{P(Y \gt 1 | x_1=1)} & = & exp(0.377)/exp(1.13) \\ \frac{P(Y \le 1 | x_1=0)}{P(Y \gt 1 | x_1=0)} & = & exp(0.377) \\ \frac{P(Y \le 2 | x_1=1)}{P(Y \gt 2 | x_1=1)} & = & exp(2.45)/exp(1.13) \\ \frac{P(Y \le 2 | x_1=0)}{P(Y \gt 2 | x_1=0)} & = & exp(2.45) \end{eqnarray} $$

From the odds of each level of pared, we can calculate the odds ratio of pared for each level of apply.

$$ \begin{eqnarray} \frac{P(Y \le 1 | x_1=1)}{P(Y \gt 1 | x_1=1)} / \frac{P(Y \le 1 | x_1=0)}{P(Y \gt 1 | x_1=0)} & = & 1/exp(1.13) & = & exp(-1.13) \\ \frac{P(Y \le 2 | x_1=1)}{P(Y \gt 2 | x_1=1)} / \frac{P(Y \le 2 | x_1=0)}{P(Y \gt 2 | x_1=0)} & = & 1/exp(1.13) & = & exp(-1.13) \\ \end{eqnarray} $$

The proportional odds assumption ensures that the odds ratios across all $J-1$ categories are the same. In our example, the proportional odds assumption means that the odds of being unlikely versus somewhat or very likely to apply $(j=1)$ is the same as the odds of being unlikely and somewhat likely versus very likely to apply ($j=2$).

Interpreting the odds ratio

The proportional odds assumption is not simply that the odds are the same but that the odds ratios are the same across categories. These odds ratios can be derived by exponentiating the coefficients (in the log-odds metric), but the interpretation is a bit unexpected. Recall that the coefficient $ – \eta_{1}$ represents a one unit change in the log odds of applying for students whose parents went to college versus parents who did not:

$$logit (P(Y \le j|x_1=1) -logit (P(Y \le j|x_1=0) = – \eta_{1}.$$

Since the exponent is the inverse function of the log, we can simply exponentiate both sides of this equation, and by using the property that $log(b)-log(a) = log(b/a)$,

$$\frac{P(Y \le j |x_1=1)}{P(Y>j|x_1=1)} / \frac{P(Y \le j |x_1=0)}{P(Y>j|x_1=0)} = exp( -\eta_{1}).$$

For simplicity of notation and by the proportional odds assumption, let $\frac{P(Y \le j |x_1=1)}{P(Y>j|x_1=1)} = p_1 / (1-p_1) $ and $\frac{P(Y \le j |x_1=0)}{P(Y>j|x_1=0)} = p_0 / (1-p_0).$ Then the odds ratio is defined as

$$\frac{p_1 / (1-p_1) }{p_0 / (1-p_0)} = exp( -\eta_{1}).$$

However, as we will see in the output, this is not what we actually obtain from Stata and R!

Stata

To obtain the odds ratio in Stata, add the option or to the ologit command.

ologit apply i.pared, or

<... omitted output...> 

Ordered logistic regression                     Number of obs     =        400
                                                LR chi2(1)        =      18.41
                                                Prob > chi2       =     0.0000
Log likelihood = -361.39515                     Pseudo R2         =     0.0248

------------------------------------------------------------------------------
       apply | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.pared |   3.087899   .8134527     4.28   0.000     1.842591    5.174843
-------------+----------------------------------------------------------------
       /cut1 |   .3768424   .1103421                      .1605758     .593109
       /cut2 |   2.451855   .1825628                      2.094039    2.809672
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

R

To obtain the odds ratio in R, simply exponentiate the coefficient or log-odds of pared. The following code uses cbind to combine the odds ratio with its confidence interval. First store the confidence interval in object ci,

(ci <- confint(m))

    2.5 %    97.5 % 
0.6131222 1.6478130

Then bind the transpose of the ci object with coef(m) and exponentiate the values,

exp(cbind(coef(m),t(ci)))

                  2.5 %   97.5 %
pared 3.087899 1.846187 5.195605

In our example, $exp(-1.127) = 0.324$, which means that students whose parents attended college have a 67.6% lower odds of being less likely to apply to college. However, this does not correspond to the odds ratio from the output! Let’s see why.

Since $exp(-\eta_{1}) = \frac{1}{exp(\eta_{1})}$,

$$exp(\eta_{1}) = \frac{p_0 / (1-p_0) }{p_1 / (1-p_1)}.$$

From the output, $\hat{\eta}_1=1.127$, which means the odds ratio $exp(\hat{\eta}_1)=3.086$ is actually $\frac{p_0 / (1-p_0) }{p_1 / (1-p_1)}.$ This suggests that students whose parents did not go to college have higher odds of being less likely to apply.

Another way to look at the odds ratio

Double negation can be logically confusing. Suppose we wanted to interpret the odds of being more likely to apply to college. We can perform a slight manipulation of our original odds ratio:

$$ \begin{eqnarray} exp(-\eta_{1}) & = & \frac{p_1 / (1-p_1)}{p_0/(1-p_0)} \\ & = & \frac{p_1 (1-p_0)}{p_0(1-p_1)} \\ & = & \frac{(1-p_0)/p_0}{(1-p_1)/p_1} \\ & = & \frac{P (Y >j | x=0)/P(Y \le j|x=0)}{P(Y > j | x=1)/P(Y \le j | x=1)}. \end{eqnarray} $$

Since $exp(-\eta_{1}) = \frac{1}{exp(\eta_{1})}$,

$$\frac{P (Y >j | x=1)/P(Y \le j|x=1)}{P(Y > j | x=0)/P(Y \le j | x=0)} = exp(\eta).$$

Instead of interpreting the odds of being in the $j$th category or less, we can interpret the odds of being greater than the $j$th category by exponentiating $\eta$ itself. In our example, $exp(\hat{\eta}) = exp(1.127) = 3.086$ means that students whose parents went to college have 3.086 times the odds of being very likely to apply (vs. somewhat or unlikely) compared to students whose parents did not go to college. The results here are consistent with our intuition because it removes double negatives. As a general rule, it is easier to interpret the odds ratios of $x_1=1$ vs. $x_1=0$ by simply exponentiating $\eta$ itself rather than interpreting the odds ratios of $x_1=0$ vs. $x_1=1$ by exponentiating $-\eta$. However by doing so, we flip the interpretation of the outcome by placing $P (Y >j)$ in the numerator.

Verifying both interpretations of the odds ratio using predicted probabilities

To verify that indeed the odds ratio of 3.08 can be interpreted in two ways, let’s derive them from the predicted probabilities in both Stata and R.

Stata

Following the ologit command, run margins with a categorical predictor to obtain predicted probabilities for each level of the predictor for each level of the outcome ($j=1,2,3$).

margins pared

Adjusted predictions                            Number of obs     =        400
Model VCE    : OIM

1._predict   : Pr(apply==0), predict(pr outcome(0))
2._predict   : Pr(apply==1), predict(pr outcome(1))
3._predict   : Pr(apply==2), predict(pr outcome(2))

--------------------------------------------------------------------------------
               |            Delta-method
               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
_predict#pared |
          1 0  |   .5931113   .0266289    22.27   0.000     .5409196     .645303
          1 1  |     .32068   .0532744     6.02   0.000     .2162641    .4250959
          2 0  |   .3275857   .0239325    13.69   0.000     .2806789    .3744926
          2 1  |   .4692269   .0333495    14.07   0.000     .4038631    .5345907
          3 0  |    .079303   .0133296     5.95   0.000     .0531774    .1054286
          3 1  |   .2100931   .0424965     4.94   0.000     .1268015    .2933847
--------------------------------------------------------------------------------

The number on the first column represents $j=1,2,3$ levels of the outcome apply and the second column represents $x_1 = 0$ and $x_1 = 1$ of pared.

R

After storing the polr object in object m, pass this object as well as a dataset with the levels of pared into the predict function. Specify type="p" for predicted probabilities.

newdat <- data.frame(pared=c(0,1))
(phat <- predict(object = m, newdat, type="p"))

   unlikely somewhat likely very likely
1 0.5931114       0.3275856  0.07930294
2 0.3206801       0.4692269  0.21009300

Each row represents the first level ($x_1=0)$ and second level ($x_1=1$) of pared, and each column represents $j=1,2,3$ outcome apply.

Interpretation 1

The first interpretation is for students whose parents did not attend college, the odds of being unlikely versus somewhat or very likely (i.e., less likely) to apply is 3.08 times that of students whose parents did go to college.

To verify this interpretation, we arbitrarily calculate the odds ratio for the first level of apply which we know by the proportional odds assumption is equivalent to the odds ratio for the second level of apply. Since we are looking at pared = 0 vs. pared = 1 for $P(Y \le 1 | x_1=x)/P(Y > 1 | x_1=x)$ the respective probabilities are $p_0=.593$ and $p_1=.321$. Then

$$\frac{p_0 / (1-p_0) }{p_1 / (1-p_1)} = \frac{0.593 / (1-0.593) }{0.321 / (1-0.321)} =\frac{1.457}{0.473} =3.08.$$

Interpretation 2

The second interpretation is for students whose parents did attend college, the odds of being very or somewhat likely versus unlikely (i.e., more likely) to apply is 3.08 times that of students whose parents did not go to college.

Here we are looking at pared = 1 vs. pared = 0 for $P(Y > 1 | x_1=x)/P(Y \le 1 | x_1=x)$. Then for the first level of apply $P(Y>1 | x_1 = 1) =0.469+0.210 = 0.679$ and $P(Y \le 1 | x_1 = 1) = 0.321$. Similarly, $P(Y>1 | x_1 = 0) =0.328+0.079= 0.407$ and $P(Y \le 1 | x_1 = 0) = 0.593.$ Taking the ratio of the two odds gives us the odds ratio,

$$ \frac{P(Y>1 | x_1 = 1) /P(Y \le 1 | x_1=1)}{P(Y>1 | x_1 = 0) /P(Y \le 1 | x_1=0)} = \frac{0.679/0.321}{0.407/0.593} = \frac{2.115}{0.686}=3.08.$$

The odds ratio for both interpretations matches the output of Stata and R.

Summary

In general, to obtain the odds ratio it is easier to exponentiate the coefficient itself rather than its negative because this is what is output directly from Stata and R (polr). The researcher must then decide which of the two interpretations to use:

For students whose parents did not attend college, the odds of being less likely to apply is 3.08 times that of students whose parents did go to college.
For students whose parents did attend college, the odds of being more likely to apply is 3.08 times that of students whose parents did not go to college.

The second interpretation is easier because it avoids double negation.

References

Bilder, C. R., & Loughin, T. M. (2014). Analysis of categorical data with R. Chapman and Hall/CRC.