1.2 Categorical random variables

A categorical variable is a variable that only takes one of a limited number of possible values, usually signifying groups, based on some underlying quantitative variable(s).
When there is no natural order or rank between groups, we call it a nominal categorical variable.

Blood type, gender, race, and zip code are examples of nominal categorical variables.

An ordinal variable is a categorical variable with a natural order, but with unknown (most likely uneven) distances between levels.

For example:

People can respond to a survey with strongly disagree, disagree, agree, and strongly agree. We cannot quantify how much bigger or smaller the distance between strongly agree and agree is compared to the distance between agree and disagree.

Socioeconomic status can be categorize as “low”, “middle”, and “high”. - A categorical variable with only two possible outcomes is a binary variable or Bernoulli variable.

For example:

In flipping a coin, the outcome must be head or tails.

A medical test that has only two possible outcome: positive or negative.

Smoking is a binary variable with outcome “Yes” or “No”.

1.2.1 Regression models of categorical variables

If the dependent variable is a binary categorical variable we use logistic regression or more specifically, binary logistic regression
If the dependent variable is a nominal categorical variable with more than two category we use multinomial logistic regression
If the dependent variable is ordinal then we use ordinal logistic regression
The term logistic comes from using the logistic function as the link function, to transform a linear combination of coefficients and predictor variables that can take values from $-\infty$ to $\infty$ to probabilities.

In binomial, multinomial, and ordinal logistic regression we use logistic functions as the link function. We will discuss logistic functions later.

1.3 Working with categorical variables in R: `factor` objects

R uses object class factor to store and manipulate categorical variables. Some functions in R automatically treat variable type character as factor, but it is always safer to change the variable’s class to factor before the analysis.

We begin by creating a factor variable in R.

Let’s first create a character vector containing values “a”, “b”, “c”, and “d”.

x <- c("a", "b", "c", "d")
x

## [1] "a" "b" "c" "d"

class(x)

## [1] "character"

Function class() determines the class of an object.

We use function factor() to transfer x to a factor.

x.f <- factor(x, levels = c("a", "b", "c", "d"), labels = c("a", "b", "c", "d"))
x.f

## [1] a b c d
## Levels: a b c d

class(x.f)

## [1] "factor"

By default, levels are the unique set of values taken by as.character(x), sorted in ascending order. For each level we can applylabels, which by default are the same as levels themselves. In the R-Studio Environment pane, we can observe the numerical values that represent each level.

The first level in a factor is the reference level. Using function relevel() we can change the reference.

relevel(x.f, ref = "b")

## [1] a b c d
## Levels: b a c d

x.f

## [1] a b c d
## Levels: a b c d

Because we did not assign the output of relevel() to an object, we did not permanently change the values of factor x.f. If we want to permanently change the reference level, we need to assign the result to a new object or the factor x.f itself.

x.f <- relevel(x.f, ref = "b")
x.f

## [1] a b c d
## Levels: b a c d

1.3.1 Optional

If we want to combine two groups into one group, we can just set two levels of an existing factor variable to be the same level inside levels(). Notice the order on the right hand side corresponds to the ordering of the levels of the original factor x.f:

# Map from 4 different values to three levels (combine c and d into c):
x.f2 <- x.f
levels(x.f2) <- c("b", "a", "c", "c")
x.f2

## [1] a b c c
## Levels: b a c

#or 
x.f2 <- factor(x.f, labels = c("b", "a", "c", "c"))
x.f2

## [1] a b c c
## Levels: b a c

We need to be cautious when we use the assignment operator <-. Above, not only did we change the levels of the factor, but also we have changed the values of vector to just 3 values. This is why we first stored a copy of the original vector.

1.3.2 Ordered `factor`s

Now we are going to create an ordered factor, which stores the ordering of the levels. We can use function factor with option ordered = TRUE.

x.of <- factor(x, ordered = TRUE)
#or
ordered(x)

## [1] a b c d
## Levels: a < b < c < d

To change the order of ordered factor we can re-specify the levels in the order we want.

factor(x.of, levels = c("d","b","c", "a"), ordered = TRUE)

## [1] a b c d
## Levels: d < b < c < a

We can also change the labels of ordered factors.

(x.of <- ordered(x.of, 
      label = c("strong disagree", "disagree", "agree", "strongly agree")))

## [1] strong disagree disagree        agree           strongly agree 
## Levels: strong disagree < disagree < agree < strongly agree

factor(x.of,
      levels= c("strongly agree", "agree", "disagree", "strong disagree" ))

## [1] strong disagree disagree        agree           strongly agree 
## Levels: strongly agree < agree < disagree < strong disagree

The first line above labels “a” with “strongly disagree”, “b” with “disagree”, “c” with “agree”, and “d” with “strongly agree”.

The second line above changes the order of the levels, but the values remain the same. Since x.of is already an ordered factor we do not need to use argument order = TRUE.

1.4 hsb data

Let’s now read the first data set. The data contains academic performance and demographic information about 200 high school students. This data set is hypothetical and was created to demonstrate statistical analyses.

hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")

names(hsb)

##  [1] "id"      "female"  "ses"     "schtyp"  "prog"    "read"    "write"  
##  [8] "math"    "science" "socst"   "honors"  "awards"  "cid"

str(hsb)

## 'data.frame':    200 obs. of  13 variables:
##  $ id     : int  45 108 15 67 153 51 164 133 2 53 ...
##  $ female : chr  "female" "male" "male" "male" ...
##  $ ses    : chr  "low" "middle" "high" "low" ...
##  $ schtyp : chr  "public" "public" "public" "public" ...
##  $ prog   : chr  "vocation" "general" "vocation" "vocation" ...
##  $ read   : int  34 34 39 37 39 42 31 50 39 34 ...
##  $ write  : int  35 33 39 37 31 36 36 31 41 37 ...
##  $ math   : int  41 41 44 42 40 42 46 40 33 46 ...
##  $ science: int  29 36 26 33 39 31 39 34 42 39 ...
##  $ socst  : int  26 36 42 32 51 39 46 31 41 31 ...
##  $ honors : chr  "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
##  $ awards : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cid    : int  1 1 1 1 1 1 1 1 1 1 ...

Variables female, schtyp (school type), prog (program), honors (honors program), and ses (socioeconomic status) are categorical variables. Variable ses is an ordinal variable with levels “low”, “middle”, and “high”.

First we transform variable female to a factor and calculate frequencies and proportions of females and males.

#making variable female as a factor 
hsb$female <- factor(hsb$female)
#levels
levels(hsb$female)

## [1] "female" "male"

#frequency table
table(hsb$female)

## 
## female   male 
##    109     91

#proportion in each level
prop.table(table(hsb$female))

## 
## female   male 
##  0.545  0.455

Now we change the reference level form “female” to “male”

#changing the reference to "male" 
hsb$female <- relevel(hsb$female, ref = "male")
#check to see reference changed
levels(hsb$female)

## [1] "male"   "female"

Exercises:

1- In hsb data transform variables schtyp (school type), prog (program), honors (honors program), and ses (socioeconomic status) to unordered factor variables and report the proportions of each category for those variables.

#transform variables schtyp, prog, honors and ses to factor
hsb$schtyp <- factor(hsb$schtyp)
prop.table(table(hsb$schtyp))

## 
## private  public 
##    0.16    0.84

hsb$prog <- factor(hsb$prog)
prop.table(table(hsb$prog))

## 
## academic  general vocation 
##    0.525    0.225    0.250

hsb$honors <- factor(hsb$honors)
prop.table(table(hsb$honors))

## 
##     enrolled not enrolled 
##        0.265        0.735

#ses as factor in order or "low", "middle", "high". Not ordered factor.
hsb$ses <- factor(hsb$ses, levels = c("low", "middle", "high"))
prop.table(table(hsb$ses))

## 
##    low middle   high 
##  0.235  0.475  0.290

2- Change the reference of variable honors to “not enrolled”

hsb$honors <- relevel(hsb$honors, ref = "not enrolled")

Next, we review some important concepts in logistic regression.

1.5 Understanding odds and odds ratios

1.5.1 Odds vs. probability

The odds of an event with probability $p$ is defined as: $\tt{odds(p)} = \dfrac{p}{1 -p}$

Suppose the probability of a student enrolling in the honors program on average is 0.3.
Then we define the odds of enrolled to be:

$\tt{odds(enrolled)} = \dfrac{\Pr(enrolled)}{1 - \Pr(enrolled)} = \dfrac{0.3}{1 - 0.3} = 0.4285714$

and odds of not enrolled will be:

$\tt{odds(not \ enrolled)} = \dfrac{\Pr(not \ enrolled)}{1 - \Pr(not \ enrolled)} = \dfrac{0.7}{1 - 0.7} = 2.333333$

As we can see, odds is a positive value between 0 and infinity. The inverse function of odds transforms the number back to between 0 and 1. For example, if the odds is equal to 2 then the probability can be calculated as follow:

$p = \dfrac{\tt{odds}}{1 + \tt{odds}} = \dfrac{2}{1 + 2} = \dfrac{2}{3} \approx 0.67$

For categorical variables with multiple categories we generalize the definition of odds to be the relative probability, or odds, of one level vs. the reference level (sometimes called relative risk). For example in the program variable (prog), which has 3 levels, “academic”, “general”, “vocational”, we can define the odds of being in the general program vs. the academic program.

$\tt{odds(general \ vs. \ academic)} = \dfrac{\Pr(general)}{\Pr(academic)}$

Let’s calculate the odds of general vs. academic and the odds of vocational vs. academic in R using probabilities calculated by table().

#first we create frequency table and 
#then we use prop.table to calculate probabilities
prog.tab <- table(hsb$prog)
(prog.p <- prop.table(prog.tab))

## 
## academic  general vocation 
##    0.525    0.225    0.250

#odds of general vs. academic
(odds.general <- 0.225  / 0.525)

## [1] 0.4285714

#odds of vocational vs. academic
(odds.vocational <- 0.250  / 0.525)

## [1] 0.4761905

The probability of a student being in vocational is about half of the chance of being in academic.

1.5.2 Calculation probability from odds in multinomial (optional)

If the odds of all categories vs. the reference category is given, using the fact that the probabilities of all categories sum to one, we can calculate the probabilities of each categories from the odds.

Since,

$\begin{eqnarray} \dfrac{1 - \Pr(\tt{academic}) }{\Pr(\tt{academic})} &=& \dfrac{\Pr(\tt{general}) + \Pr(vocational)}{\Pr(\tt{academic})} \\ &=& \tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)} \end{eqnarray}$

Thus,

$\begin{eqnarray} \Pr(\tt{academic}) &=& \dfrac{1}{1 +\tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)}}, \\ \Pr(\tt{general}) &=& \Pr(\tt{academic}) \times \tt{odds(general \ vs. \ academic)}, \tt{and} \\ \Pr(\tt{vocational}) &=& \Pr(\tt{academic}) \times \tt{odds(vocational \ vs. \ academic)}.\\ \end{eqnarray}$

1.5.3 Odds ratios (binary variable)

An odds ratio is the ratio of the odds of an event for one group to the odds of same event for another group.

For example the odds ratio of enrollment in the honors program for females to males is $\text{OddsRatio(enrolled for female to male)} =\dfrac{\tt{odds(enrolled \ | \ female)}}{\tt{odds(enrolled \ | \ male)}}$ .

In the hsb data, we can calculate the odds ratio from a 2 by 2 table of honors vs. female.

#calculating proportion of enrollment for each group of female and male. 
#table gives frequencies 
(tab_honors_female <- table(hsb$honors, hsb$female))

##               
##                male female
##   not enrolled   73     74
##   enrolled       18     35

#prop.table on frequency table with margin = 2 returns column-vise proportions.
prop.table(tab_honors_female, margin = 2)

##               
##                     male    female
##   not enrolled 0.8021978 0.6788991
##   enrolled     0.1978022 0.3211009

(odds.ratio <- (0.3211009 / 0.6788991) / (0.1978022 / 0.8021978) )

## [1] 1.918168

Odds of enrollment in the honors program for a female student is almost twice the odds for a male student. Or, the odds of enrollment for female students is about 92% more than male students.

We could also use the frequencies directly to calculate odds and odds ratios since the total number of females and males will cancel in the numerator and denominator.

(35/74) / (18/73)

## [1] 1.918168

-Note: Calculating probability from odd-ratio is only possible if we know odds of at least one group. Otherwise, it is not possible to calculate probabilities from the odds ratio.

1.5.4 Odds Ratios (multinomial variable)

For categorical variable with more than two categories, we define a generalized form of odds ratio as: the ratio of odds of a category vs. a reference category for one group, to the odds of the same category vs. the reference category for another group.
Sometimes this is called a relative risk ratio.

For example the odds ratio comparing females to males for the odds of “general” vs. “academic” (the reference category) is:

$\text{OddsRatio(general vs. academic for female to male)} = \dfrac{\left[\dfrac{\Pr (\text{general | female})}{\Pr(\text{academic | female})}\right]}{\left[\dfrac{\Pr (\text{general | male})}{\Pr(\text{academic | male})}\right]}$

The above can also be expressed as the ratio of “the relative risks of general vs. academic program” comparing females to males.
Below, we use the hsb data to construct a 3 by 2 table of program vs. female to calculate the odds ratio of general vs. academic comparing females to males.

#3 X 2 table of program vs. female
table(hsb$prog, hsb$female)

##           
##            male female
##   academic   47     58
##   general    21     24
##   vocation   23     27

#odds ratios using frequencies
(odds.ratio <- (24/58) / (21/47))

## [1] 0.9261084

We can say that the odds of general program vs. academic program for female student is $0.926$ times less than the odds of general program vs. academic program for male students.

Or, we can say the odds of general program vs. academic program for female student is $7.4\%$ less than the odds of general program vs. academic program for male students.

1.6 Logit link

All logistic regression models (binary, multinomial, and ordinal) can model the probability of being in each level of the outcome. The goal is to predict (model) this probability, restricted between $0$ and $1$ , by linear combinations of independent variables. However, we want to make it possible that a linear combinations of predictors can take on any value between $-\infty$ and $+\infty$ .

We use the logit function to transform a value between 0 and 1 to a value between $-\infty$ and $+\infty$ .
The inverse of the logit function, the logistic function, maps a value between $-\infty$ and $+\infty$ to interval the $(0, 1)$ .

The logit function transforms probabilities to log-odds: $\tt{logit} (P) = \log\left(\dfrac{P}{1-P}\right)$ - In regression, a function that map the expected (mean) of the outcome to all possible linear predictions (real numbers) is called link function.

The chart below shows how the logit function transforms a range from $(0, 1)$ to $(-\infty, +\infty)$ :

1.7 Binary logistic model

Suppose $y$ is the dependent variable and $x_1, x_2,\cdots, x_p$ are independent variables. In the linear regression model we model $y$ given $x_1, x_2,\cdots, x_p$ as:

$\begin{equation} y | x_1, x_2,\cdots, x_p = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p + \epsilon \tt{\ \ (1.1)} \end{equation}$

$\mu_{y | x_1, x_2,\cdots, x_p} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p \tt{\ \ \ \ \ (1.2)}$

$\mu_y$ is the mean of dependent variable for specific values of independent variables. $\epsilon$ is the unexplained error term or uncertainty of the model that has mean equal to 0.

The right hand side of the equation 2 is also called the linear predictor and here it predicts the mean of dependent variable.
In the logistic regression, we model the log-odds of the outcome by a linear function of the predictors:

$\log(odds | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p \tt{\ \ \ \ \ (1.3)}$

or,

$\frac{Pr(y = 1| x_1, x_2 , \cdots x_p)}{Pr(y = 0| x_1, x_2 , \cdots x_p)} = \exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + ... + \beta_p x_p)} \tt{\ \ \ \ \ (1.4)}$

1.7.1 Calculating predicted probability using odds (Optional)

solving for $\Pr(y = 1)$ given that $\Pr(y = 0) = 1 - \Pr(y = 1)$ we have,

$Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}}{1+\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}} \tt{\ \ \ \ \ (1.5)}$

or,

$Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp[{-(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}]} \tt{\ \ \ \ \ (1.6)}$ and

$Pr(y = 0 | x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp\left[(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)\right]} .$

1.8 Logistic regression in R

In R we use the function glm() for logistic regression with family=binomial() and link="logit".

The syntax of glm is as follows:

glm(formula, family = binomial(link = "logit"), data = data, weights, subset, na.action, ...)

The first argument is the model formula, followed by defining distribution of the outcome.

1.8.1 Logistic regression example: `honors` predicted by `math` and `female`

For this analysis we use data set hsb to model enrollment to honor program as the outcome and variables math and female as independent variables. Our model assumes the effect of math score is the same for females and males (i.e., no interactions between math and female).

To extract the results of a regression model we use generic function summary().
Other functions like coefficient() and confint() can be used to extract coeffcients and estimate confidence intervals, respectively.

#to estimate coefficient of the model we use glm with  
#the option family = binomial(link = "logit")
logit.m <- glm(honors ~ math + female, family = binomial(link = "logit"), data = hsb)
summary(logit.m)

## 
## Call:
## glm(formula = honors ~ math + female, family = binomial(link = "logit"), 
##     data = hsb)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0188  -0.6676  -0.3110   0.4119   2.5408  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -11.79228    1.71892  -6.860 6.87e-12 ***
## math           0.18254    0.02842   6.423 1.34e-10 ***
## femalefemale   1.12085    0.42403   2.643  0.00821 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 231.29  on 199  degrees of freedom
## Residual deviance: 158.46  on 197  degrees of freedom
## AIC: 164.46
## 
## Number of Fisher Scoring iterations: 5

coef(summary(logit.m))

##                Estimate Std. Error   z value     Pr(>|z|)
## (Intercept)  -11.792281 1.71891936 -6.860288 6.872196e-12
## math           0.182538 0.02842085  6.422678 1.338976e-10
## femalefemale   1.120847 0.42403214  2.643308 8.210041e-03

confint(logit.m)

##                    2.5 %    97.5 %
## (Intercept)  -15.4710305 -8.689254
## math           0.1309423  0.243122
## femalefemale   0.3143782  1.986565

Based on the results of the model we have:

$\log(\text{odds of enrolled}) = \log\left[\dfrac{\Pr (\text{enrolled})}{\Pr(\text{not enrolled})}\right]= -11.792281 + 0.182538 \text{ math} + 1.120847 \text{ female}$

1.8.2 Interpreting coefficients

The coefficient of continuous predictor math:

For one unit increase of math score the log-odds of enrolled is expected to increase by 0.182538, holding other predictors (female) constant.

The coefficient of binary predictor female:

The log-odds of enrolled for a female student is 1.120847 greater than the log-odds of enrolled for a male student on average.

The intercept is the expected log-odds of enrolled for a male student (female=0) when math=0.
Since the math=0 is unrealistic, it is not useful to interpret this intercept.

If we exponentiate the coefficients with exp(), then we can interpret coefficients in the odds ratio metric.

#exponentiate coefficients
exp(coef(logit.m))

##  (Intercept)         math femalefemale 
## 7.562706e-06 1.200260e+00 3.067452e+00

We can also use exp() with confint() to calculate the confidence intervals of the odds ratio.

#exponentiate coefficients and column bind with exponentiation of CIs

cbind(OR = exp(coef(logit.m)), exp(confint(logit.m)))

##                        OR        2.5 %       97.5 %
## (Intercept)  7.562706e-06 1.909927e-07 0.0001683856
## math         1.200260e+00 1.139902e+00 1.2752241647
## femalefemale 3.067452e+00 1.369407e+00 7.2904468215

The above confidence interval (CI) is based on the profile likelihood method. If you want to calculate CIs based on standard errors you can use confint.default instead.

The intercept, $7.562706 e-06$ , is the expected odds of enrolled for a male student with math=0.
For one unit increase in math score, the odds of enrolled in honors on average increases by $20\%$ .
The odds of enrolled for females is on average about $3.067$ times more than males, holding math constant.

2 Multinomial logistic regression

2.1 Multinomial logistic model

Suppose we want to model a regression with a categorical outcome with $J$ number of categories and the $J$ th level is the reference category.

For each category $j$ from $1, 2, \cdots, J-1$ the log-odds of category $j$ versus the reference category $J$ given predictors $x_1, x_2, \cdots, x_p$ is modeled as the form below:

$\log(\text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p)) =\log \left[ \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} \right] = \beta_{0j} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p \tt{\ \ \ \ \ (2.1)}$

where $odds_j$ is “generalized” odds and $Pr(y = J) = 1 - \sum_{j=1}^{J-1} Pr(y = j).$

We can exponentiate both sides of equation (2.1):

$\text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p) = \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \exp(\beta_{j0} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p)\tt{\ \ \ \ \ \ (2.2)}$ - In multinomial logistic regression, for an outcome with 3 levels we need to estimate $(p + 1) * 2$ coefficients and for an outcome with 4 levels we estimate $(p + 1) * 3$ coefficients.

It is not practical to run a multinomial logistic regression for an outcome with too many levels.
The complete or quasi-complete separation problem is very common in multinomial logistic regression with small samples. For more information about complete and quasi-complete separation, can see this page:

FAQ What is complete or quasi-complete separation in logistic/probit regression and how do we deal with them?

An assumption of multinomial logistic regression which is often ignored is the assumption of Independence of Irrelevant Alternatives (IIA). The IIA assumption means that there is no other alternative for the outcome that, if included, would disproportionately influence the membership of one of the other categories. In other words, the generalized odds ratios will not be changed by including or excluding the alternative category. The link below discusses how important this assumption is.

How Relevant is the Independence of Irrelevant Alternatives?

2.1.1 Calculating predicted probability using odds (Optional)

We can now calculate the probability of the reference level using the odds by summing both sides over all $j$ from $1, 2, \cdots, J-1$ ,

$\dfrac{1 - \Pr(y = J|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p)$

Solving for $Pr(y = J)$ we get:

$\Pr(y = J|x_1, x_2, \cdots, x_p) = \dfrac{1}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.3)}$

To get the probability of the other $j$ levels from $1, 2, \cdots, J-1$ we just need to multiply the odds of each level by probability of the reference level thus,

$\Pr(y = j|x_1, x_2, \cdots, x_p) = \dfrac{\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p)}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.4)}$

2.2 R example of multinomial logistic regression

We use the multinom() function from the nnet package to estimate a multinomial logistic regression model.
For the first example, suppose we want to model students’ membership to different programs (variable prog) as predicted by write (writing score) and ses (socioeconomic status), using the hsb data.

The outcome variable prog has 3 levels, “academic”, “general”, “vocation” with the reference group “academic”(the $J$ th category).

For help documentation about the multinom() function, see ?multinom. For this workshop we only use the basic arguments of the multinom() function.

#summary of the outcome prog
(tab.prog <- table(hsb$prog))

## 
## academic  general vocation 
##      105       45       50

prop.table(tab.prog)

## 
## academic  general vocation 
##    0.525    0.225    0.250

#running multinomial regression model for prog on socio econonic status and write
library(nnet)
m.multinom <- multinom(prog ~ ses + write, data = hsb)

## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.982880
## final  value 179.981726 
## converged

The model runs and information is printed about convergence process. Output includes some iteration history and the final negative log-likelihood (179.981726). This value multiplied by two is then seen in the model summary as the Residual Deviance and can be used in comparisons of nested models.
The result is stored in a multinom class object and to extract the results, we again use generic function summary().

#extracting the result of model using summary
(m.results <- summary(m.multinom, digits = 3))

## Call:
## multinom(formula = prog ~ ses + write, data = hsb)
## 
## Coefficients:
##          (Intercept) sesmiddle seshigh   write
## general         2.85    -0.533  -1.163 -0.0579
## vocation        5.22     0.291  -0.983 -0.1136
## 
## Std. Errors:
##          (Intercept) sesmiddle seshigh  write
## general         1.17     0.444   0.514 0.0214
## vocation        1.16     0.476   0.596 0.0222
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635

The first output line is the model formula.

The results are in two blocks. The first block reports the coefficients for each category ( $j$ from 1 to $J - 1$ ) and the second block reports standard errors.

2.3 Interpreting the coefficients of multinomial logistic

If we substitute the model results into equation 2.1 we have:

$\begin{eqnarray} \log\left[\dfrac{\Pr(\tt{general})}{\Pr(\tt{academic})}\right] = 2.85 -0.533 \times\text{(ses = middle)} -1.163\times\text{(ses = high)} -0.0579\times \tt{write}, \\ \\ \log\left[\dfrac{\Pr(\tt{vocation})}{\Pr(\tt{academic})}\right] = 5.22 + 0.291 \times\text{(ses = middle)} - 0.983 \times\text{(ses = high)} - 0.1136 \times \tt{write}. \\ \end{eqnarray}$

Intercepts:

The expected log-odds of being in general vs. academic is equal to 2.85 for a person with low socio economic status and writing score equal to zero.
The expected log-odds of being in vocation vs. academic is equal to 5.22 for a person with low socio economic status and writing score equal to zero.

For independent variable write:

A one-unit increase in write decreases log-odds of being in the general vs. academic program by $.058$ on average.
A one-unit increase in the variable write decreases the log-odds of being in the vocation vs. academic program by $0.1136$ on average.

For ses:

The log-odds of being in the general program vs. academic program for ses=“high” is lower than ses=“low” by $1.163$ .
The log odds of being in the general program vs. academic program will decrease by $0.533$ if moving from ses= “low” to ses=“middle”.
The log odds of being in the vocation program vs. academic program will decrease by $0.983$ if moving from ses=“low” to ses=“high”.
The log odds of being in the vocation program vs. academic program will increase by $0.291$ if moving from ses=“low” to ses=“middle”.
We can exponentiate the coefficients from our model to get coefficients in term of the odds ratios.

exp(coef(m.multinom))

##          (Intercept) sesmiddle   seshigh     write
## general     17.32582 0.5866769 0.3126026 0.9437172
## vocation   184.61262 1.3382809 0.3743123 0.8926116

The expected odds of being in general vs. academic is equal to $17.32582$ for a person with low socio economic status and writing score equal to zero.
The expected odds of being in vocation vs. academic is equal to $184.61262$ for a person with low socio economic status and writing score equal to zero.
The odds of being in general program vs. academic program is expected to decrease by a factor of $.9437$ , for one-unit increase in the variable write, keeping socio economic status constant.
The odds of being in vocational program vs. academic program is expected to decrease by a factor of $0.8926116$ , for one-unit increase in the variable write, keeping socio economic status constant.
The odds for being in general program vs. academic program expected to decreases by a factor of $.3126$ , switching from ses = low to high, keeping writing score constant.
The odds for being in general program vs. academic program on average decreases by $41.3\%$ , switching from ses = low to middle, keeping writing score constant.
The odds for being in vocational program vs. academic program expected to decreases by a factor of $0.3743$ , switching from ses = low to high, keeping writing score constant.
The odds for being in vocational program vs. academic program on average increases by $33.8\%$ , switching from ses = low to middle, keeping writing score constant.

2.4 Test statistics in Multinomial logistic regression

The summary does not give the z-score and p-value for testing the hypothesis that a coefficient equals zero. We can calculate the z-score and p-value with information from summary().

#calculating z score by deviding estimated coefficient by it's standard error
z <- m.results$coefficients/m.results$standard.errors
#using pnorm we get probability of tail of normal distribution for z value
p <- (1 - pnorm(abs(z), 0, 1)) * 2
#print the result of z-scores and p-values
print(z,digits = 4)

##          (Intercept) sesmiddle seshigh  write
## general        2.445   -1.2018  -2.261 -2.706
## vocation       4.485    0.6117  -1.650 -5.113

print(p, digits = 4)

##          (Intercept) sesmiddle seshigh     write
## general    1.448e-02    0.2294 0.02374 6.819e-03
## vocation   7.299e-06    0.5408 0.09895 3.176e-07

To test overall effect of variable ses we can use likelihood ratio test.
We run the model excluding variable ses and the we conduct likelihood ratio test using function anova().

#update the model to exclude ses
m.multinom.r <- update(m.multinom, . ~ . -ses)

## # weights:  9 (4 variable)
## initial  value 219.722458 
## final  value 185.510837 
## converged

#anova of the restricted model vs. the full model
anova(m.multinom.r, m.multinom)

## Likelihood ratio tests of Multinomial Models
## 
## Response: prog
##         Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
## 1       write       396   371.0217                                 
## 2 ses + write       392   359.9635 1 vs 2     4 11.05822 0.02591741

The likelihood ratio test ( $\chi^2 = 11.05822, {df = 4} , p = 0.02591741$ ) rejects at $\alpha = 0.05$ , indicating that based on our data, the effect of variable ses on program controlling for writing score is unlikely by chance.

2.5 Ploting predicted value in multinomial logistic regression

We can also use predicted probabilities to help us understand the model. We can calculate predicted probabilities for each of our outcome levels using the fitted() function.
We can start by generating the predicted probabilities for the observations in our data and viewing the first few rows.

#fitted gives fitted probability for observations, head report the first 6 rows
head(pp <- fitted(m.multinom))

##    academic   general  vocation
## 1 0.1482764 0.3382454 0.5134781
## 2 0.1202017 0.1806283 0.6991700
## 3 0.4186747 0.2368082 0.3445171
## 4 0.1726885 0.3508384 0.4764731
## 5 0.1001231 0.1689374 0.7309395
## 6 0.3533566 0.2377976 0.4088458

As we can see from the predicted probability the rows will sum to 1.

It will be more helpful to examine the changes in predicted probability associated with one of our two variables. We can create small datasets varying one variable while holding the other constant. We will first do this by holding write at its mean and examining the predicted probabilities for each level of ses.

#new data with all levels of ses and write at the mean of write.
newdata.ses <- data.frame(ses = c("low", "middle", "high"), write = mean(hsb$write))
#column bind the new data with predicted probabilities 
cbind(newdata.ses, probability = predict(m.multinom, newdata = newdata.ses, "probs"))

##      ses  write probability.academic probability.general probability.vocation
## 1    low 52.775            0.4396845           0.3581917            0.2021238
## 2 middle 52.775            0.4777488           0.2283353            0.2939159
## 3   high 52.775            0.7009007           0.1784939            0.1206054

Another way to understand the model using the predicted probabilities is to look at the averaged predicted probabilities for different values of the continuous predictor variable write, for example from 30 to 70 within each level of ses. (It is suggested to check the range of continuous variable first)

#check the range of write
range(hsb$write)

## [1] 31 67

#in new data,  for write from 30 to 40 we replicate levels of ses 
newdata.write <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),
    3))
#predict gives predicted probability for each rows of newdata
pp.write <- cbind(newdata.write, predict(m.multinom, newdata = newdata.write, type = "probs", se = TRUE))
#first few rows
head(pp.write)

##   ses write   academic   general  vocation
## 1 low    30 0.09843588 0.2999880 0.6015762
## 2 low    31 0.10716868 0.3082195 0.5846118
## 3 low    32 0.11650390 0.3162093 0.5672868
## 4 low    33 0.12645834 0.3239094 0.5496323
## 5 low    34 0.13704576 0.3312711 0.5316831
## 6 low    35 0.14827643 0.3382454 0.5134781

#calculate the mean probabilities within each level of ses
by(pp.write[, 3:5], pp.write$ses, colMeans)

## pp.write$ses: high
##  academic   general  vocation 
## 0.6164315 0.1808037 0.2027648 
## ------------------------------------------------------------ 
## pp.write$ses: low
##  academic   general  vocation 
## 0.3972977 0.3278174 0.2748849 
## ------------------------------------------------------------ 
## pp.write$ses: middle
##  academic   general  vocation 
## 0.4256198 0.2010864 0.3732938

To plot the predicted probability, first we need to make the predicted probability data set in a long form. We do this because we want to create factor program with 3 levels and all probabilities in one column.

#pivot_longer from package tidy reshape the wide format to long format
library(tidyr)
long.pp <- pp.write %>% pivot_longer(cols = c("academic", "general", "vocation"), names_to = "Program", values_to = "Probability")
#fist few rows will be
head(long.pp)

## # A tibble: 6 × 4
##   ses   write Program  Probability
##   <chr> <int> <chr>          <dbl>
## 1 low      30 academic      0.0984
## 2 low      30 general       0.300 
## 3 low      30 vocation      0.602 
## 4 low      31 academic      0.107 
## 5 low      31 general       0.308 
## 6 low      31 vocation      0.585

Now we are ready to plot the predicted probability as function of write, separatly by ses and Program.

library(ggplot2)
ggplot(long.pp, aes(x = write, y = Probability, colour = ses)) + 
  geom_line(lwd=1.2) + 
  facet_grid(Program ~., scales = "free")

2.6 Package `mlogit`

Multinomial logistic models can also estimated using other packages in R. For this workshop we used package nnet because it did not required to reshape the data. Another package is mlogit.

For the mlogit package we need to reshape the data into mlogit.data() format. The example below shows how to reshape the hsb data with variable prog as choice option.

library(mlogit)
#reshapes the data in data frame with index, dfidx 
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "prog")
class(hsb.choice)

## [1] "dfidx_mlogit" "dfidx"        "data.frame"   "mlogit.data"

#each row extended to 3 row for alternative index
head(data.frame(hsb.choice))

##    id female    ses schtyp  prog read write math science socst       honors
## 1  45 female    low public FALSE   34    35   41      29    26 not enrolled
## 2  45 female    low public FALSE   34    35   41      29    26 not enrolled
## 3  45 female    low public  TRUE   34    35   41      29    26 not enrolled
## 4 108   male middle public FALSE   34    33   41      36    36 not enrolled
## 5 108   male middle public  TRUE   34    33   41      36    36 not enrolled
## 6 108   male middle public FALSE   34    33   41      36    36 not enrolled
##   awards cid chid      alt    idx
## 1      0   1    1 academic 1:emic
## 2      0   1    1  general 1:eral
## 3      0   1    1 vocation 1:tion
## 4      0   1    2 academic 2:emic
## 5      0   1    2  general 2:eral
## 6      0   1    2 vocation 2:tion

#running multinomial logistic using mlogit
m.mlogit <- mlogit(prog ~ 0 | ses + write , data = hsb.choice)
#extract the result using summary
summary(m.mlogit)

## 
## Call:
## mlogit(formula = prog ~ 0 | ses + write, data = hsb.choice, method = "nr")
## 
## Frequencies of alternatives:choice
## academic  general vocation 
##    0.525    0.225    0.250 
## 
## nr method
## 4 iterations, 0h:0m:0s 
## g'(-H)^-1g = 7.89E-07 
## gradient close to zero 
## 
## Coefficients :
##                       Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):general   2.852186   1.166439  2.4452  0.014477 *  
## (Intercept):vocation  5.218200   1.163549  4.4847 7.301e-06 ***
## sesmiddle:general    -0.533291   0.443732 -1.2018  0.229429    
## sesmiddle:vocation    0.291393   0.476374  0.6117  0.540743    
## seshigh:general      -1.162832   0.514219 -2.2614  0.023737 *  
## seshigh:vocation     -0.982670   0.595567 -1.6500  0.098948 .  
## write:general        -0.057928   0.021411 -2.7056  0.006819 ** 
## write:vocation       -0.113603   0.022220 -5.1127 3.177e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -179.98
## McFadden R^2:  0.11815 
## Likelihood ratio test : chisq = 48.23 (p.value = 1.063e-08)

One advantage of mlogit is that it gives test statistics for coefficients in the summary.
Package mlogit can handle mixed effect multinomial logistic model. It also designed to handle choice model.

2.7 Exercise for multinomial logistic regression:

In hsb data First center math score to its mean and call it c.math. Then, run a multinomial logistic regression with the ses variable as a nominal outcome and use variable c.math and female as predictors, and with the interaction between c.math and female.

Solution:

#first create variable c.math
hsb$c.math <- scale(hsb$math, center = TRUE, scale = FALSE)
#run the moltinomial regression with nnet::multinom()
ex.multinom <- multinom(ses ~ c.math + female + c.math:female, data = hsb)

## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 199.706497
## final  value 198.527428 
## converged

(result <- summary(ex.multinom))

## Call:
## multinom(formula = ses ~ c.math + female + c.math:female, data = hsb)
## 
## Coefficients:
##        (Intercept)     c.math femalefemale c.math:femalefemale
## middle   1.3152562 0.07581224   -0.8768092        -0.060864602
## high     0.8092806 0.09105872   -1.0052843         0.007573346
## 
## Std. Errors:
##        (Intercept)     c.math femalefemale c.math:femalefemale
## middle   0.3398332 0.03711554    0.4143891           0.0459392
## high     0.3609246 0.03932942    0.4582664           0.0506627
## 
## Residual Deviance: 397.0549 
## AIC: 413.0549

#using mlogit first we need to reshape the data
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "ses")
m.mlogit <- mlogit(ses ~ 0 | c.math * female, data = hsb.choice)
#extract the result
summary(m.mlogit)

## 
## Call:
## mlogit(formula = ses ~ 0 | c.math * female, data = hsb.choice, 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##   high    low middle 
##  0.290  0.235  0.475 
## 
## nr method
## 5 iterations, 0h:0m:0s 
## g'(-H)^-1g = 0.00023 
## successive function values within tolerance limits 
## 
## Coefficients :
##                      Estimate Std. Error z-value Pr(>|z|)  
## (Intercept):low     -0.809263   0.360924 -2.2422  0.02495 *
## (Intercept):middle   0.505971   0.240321  2.1054  0.03526 *
## c.math:low          -0.091058   0.039329 -2.3153  0.02060 *
## c.math:middle       -0.015247   0.024738 -0.6163  0.53768  
## femalefemale:low     1.005263   0.458266  2.1936  0.02826 *
## femalefemale:middle  0.128477   0.352987  0.3640  0.71588  
## c.math:low          -0.007574   0.050663 -0.1495  0.88116  
## c.math:middle       -0.068437   0.037904 -1.8055  0.07099 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -198.53
## McFadden R^2:  0.057246 
## Likelihood ratio test : chisq = 24.11 (p.value = 0.00049843)

3 Ordinal logistic Regression

3.1 Ordinal logistic model

Let $y$ be an ordinal outcome variable with $J$ categories, ordered from 1 to J. We are defining $\Pr(y \leq j)$ to be the cumulative probability of $y$ less than or equal to a specific category $j$ for $j = 1, 2, \cdots J - 1$ and thus, $\Pr(y \leq J) = 1$ .

The odds of outcome being less than equal to a particular category will be:

$odds(y \leq j) = \dfrac{\Pr(y \leq j)}{\Pr(y > j)} \tt{ \ \ \ \ \ \ (3.1)}$ is defined for $j = 1, 2, \cdots J - 1$ . Since $\Pr(y \gt J) = 0$ the odds is undefined for the $J$ th category.

This generalization of odds is also called cumulative odds.

The ordinal logistic regression model can be defined as

$\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1j}x_1 + \cdots + \beta_{pj} x_p\tt{ \ \ \ \ \ \ (3.2)}$

With the parallel lines assumption (proportional odds assumption), the intercepts are different for each category but the slopes are assumed to be constant across categories, which simplifies the equation above to

$\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1}x_1 + \cdots + \beta_{p} x_p \tt{ \ \ \ \ \ \ (3.3)}$

The above model is also called cumulative link model.
In R we use package ordinal and function clm.

clm(formula, scale, nominal, data, weights, start, subset, doFit = TRUE, na.action, contrasts, model = TRUE, control=list(), link = c("logit", "probit", "cloglog", "loglog", "cauchit", "Aranda-Ordaz", "log-gamma"), threshold = c("flexible", "symmetric", "symmetric2", "equidistant"), ...)

We can also use function polr from package MASS to estimate this model. In clm also polr (and also in Stata, ologit) the ordinal logistic regression model is parameterized as

$logit (P(y \le j)) = \beta_{j0} - \eta_{1}x_1 - \cdots - \eta_{p} x_p$ and will return $\eta_i = -\beta_i$ .

3.2 R example for ordinal logistic regression

For an example of ordinal logistic regression we use data “ologit.csv” from the OARC website.
Data contains variables apply for either student apply for grad school, pared For parents education, public for type of school they have attended, and gpa for GPA score.
To run the ordinal logistic regression with proportional adds assumption we use function clm from package ordinal.
In the first analysis we regress variable apply by pared (Parent’s education).
Variable apply is an ordinal variable with 3 levels 0,1,2 indicating “unlikely”, “somewhat likely”, and “very likely” to state if student apply for grad school.
pared is a binary variable with 1 meaning the student’s parent has higher degree and 0 meaning the parent does not have higher degree.

#reading the data into R
dat <- read.csv("https://stats.idre.ucla.edu/stat/data/ologit.csv")
#structure of the data
str(dat)

## 'data.frame':    400 obs. of  4 variables:
##  $ apply : int  2 1 0 1 1 0 1 1 0 1 ...
##  $ pared : int  0 1 1 0 0 0 0 0 0 1 ...
##  $ public: int  0 0 1 0 0 1 0 0 0 0 ...
##  $ gpa   : num  3.26 3.21 3.94 2.81 2.53 ...

First we change the variable apply from integer to ordinal factor and label the levels with “unlikely”, “somewhat likely”, “very likely”.
Comparing to the model (3.3), the outcome $j = J = 3$ is “very likely”, $j = 2$ is “somewhat likely”, and $j = 1$ is “unlikely”.

#transform apply to ordered factor
dat$apply <- factor(dat$apply, labels = c("unlikely", "somewhat likely", "very likely") ,ordered = TRUE)
str(dat$apply)

##  Ord.factor w/ 3 levels "unlikely"<"somewhat likely"<..: 3 2 1 2 2 1 2 2 1 2 ...

Parent education is a binary variable. since it has value 0 and 1 we can run the model without making pared as factor. However, we change it to factor with labels “not attend” and “attend” for 0 and 1.

#transform pared to factor
dat$pared <- factor(dat$pared)
#checking levels
levels(dat$pared)

## [1] "0" "1"

#change labels to "not attend" and "attend"
dat$pared <- factor(dat$pared, labels = c("not attend", "attend"))
#structure of pared
str(dat$pared)

##  Factor w/ 2 levels "not attend","attend": 1 2 2 1 1 1 1 1 1 2 ...

Now we run the model using function clm which stands for Cumulative Link Model.

#loading library ordinal
library(ordinal)
#running clm for cumulative link(logit) model
ordinal.clm <- clm(apply ~ pared , data = dat)
#extract the result
summary(ordinal.clm)

## formula: apply ~ pared
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -361.40 728.79 5(0)  1.25e-10 9.3e+00
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## paredattend   1.1275     0.2634    4.28 1.87e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##                             Estimate Std. Error z value
## unlikely|somewhat likely      0.3768     0.1103   3.415
## somewhat likely|very likely   2.4519     0.1826  13.430

In the output above, we see

formula, this is R reminding us what type of model we ran.
Next we see regression coefficient table including the value of each coefficient, standard errors, z value, and p-value.
Next we see the estimates for the two intercepts, which are sometimes called threshold coefficients. The intercepts indicate where the latent variable is cut to make the three groups that we observe in our data. Note that this latent variable is continuous. In general, these are not used in the interpretation of the results. The threshold coefficients are closely related to cutpoints, which are reported by other statistical packages.

The estimated model can be written in the form of equation (3.3):

$\begin{eqnarray} \text{logit} (\hat{P}(Y \le 1)) & = & 0.3768 - 1.1275 \times (\text{pared = attended)} \\ \text{logit} (\hat{P}(Y \le 2)) & = & 2.4519 - 1.1275 \times (\text{pared = not attended}) \end{eqnarray}$

We can also get confidence intervals for the parameter estimates. These can be obtained either by profiling the likelihood function or by using the standard errors and assuming a normal distribution. Note that profiled CIs are not symmetric (although they are usually close to symmetric). If the 95% CI does not cross 0, the parameter estimate is statistically significant at $\alpha = 0.05$ .

(ci <- confint(ordinal.clm)) # default method gives profiled CIs

##                 2.5 %   97.5 %
## paredattend 0.6131224 1.647807

confint.default(ordinal.clm) # CIs assuming normality

##                                 2.5 %   97.5 %
## unlikely|somewhat likely    0.1605758 0.593109
## somewhat likely|very likely 2.0940393 2.809673
## paredattend                 0.6111730 1.643809

3.3 Interpreting the result of ordinal logistic regression

The coefficients from the model can be somewhat difficult to interpret because they are scaled in terms of log-odds. To interpret logistic regression models we convert the coefficients into odds ratios. To get the OR and confidence intervals, we just exponentiate the estimates and confidence intervals.

# OR and CI also the reciprocal of the odds ratio
exp(cbind(OR=-coef(ordinal.clm)[3], inv.OR = coef(ordinal.clm)[3], ci))

##                    OR   inv.OR    2.5 %   97.5 %
## paredattend 0.3238448 3.087899 1.846187 5.195574

These coefficients are called proportional odds ratio and we would interpret these pretty much as we would odds ratio from a binary logistic regression.
Notice the betas are negative of coefficients. I included both Odds ratio and reciprocal odds ratio.
For students whose parents did attend college, the odds of being less likely (i.e., unlikely versus somewhat likely and very likely) to apply is $0.3238448$ times that of students whose parents did not go to college.
Because the proportional odds assumption we can also say:

For students whose parents attend college, the odds of being more likely to apply (i.e., very likely versus unlikely or somewhat likely) is $3.09$ times that of students whose parents did go to college.

3.4 Using polr in package MASS

There are other packages in R that can be used to run ordinal logistic regression models such as the polr() from package MASS.

#loading library MASS
library(MASS)
#running the model with polr from package MASS
m.plr <- polr(apply ~ pared, data = dat)
summary(m.plr)

## Call:
## polr(formula = apply ~ pared, data = dat)
## 
## Coefficients:
##             Value Std. Error t value
## paredattend 1.127     0.2634    4.28
## 
## Intercepts:
##                             Value   Std. Error t value
## unlikely|somewhat likely     0.3768  0.1103     3.4152
## somewhat likely|very likely  2.4519  0.1826    13.4302
## 
## Residual Deviance: 722.7903 
## AIC: 728.7903

The results are identical to result from clm(). However, package ordinal has more options and also we can run a multilevel ordinal model using function clmm2()

The summary() of a polr object does not give the p-value for the test statistics. We can approximate the p-value for the test using normal approximation to t distribution.

# store table
(ctable <- coef(summary(m.plr)))

##                                 Value Std. Error   t value
## paredattend                 1.1274909  0.2634324  4.280000
## unlikely|somewhat likely    0.3768429  0.1103421  3.415222
## somewhat likely|very likely 2.4518558  0.1825629 13.430200

# calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2

# combined table
(ctable <- cbind(ctable, "p value" = p))

##                                 Value Std. Error   t value      p value
## paredattend                 1.1274909  0.2634324  4.280000 1.868929e-05
## unlikely|somewhat likely    0.3768429  0.1103421  3.415222 6.372997e-04
## somewhat likely|very likely 2.4518558  0.1825629 13.430200 4.023243e-41

3.5 Testing propotional odds assumption

One of the assumptions underlying ordinal logistic regression, the proportional odds (PO) assumption, is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption.

To test this assumption we can use likelihood ratio test.

_ To run ordinal logistic regression with proportional odds assumption relaxed we include option nominal = ~ pared so the model runs assuming the effect of predictors that are included in option nominal is nominal.

Now use likelihood ratio test to test against more restricted model that is the model with proportional adds assumption. For this, we use function anova.

#likelihood ratio test for parallel assumption

#ordinal logistic regression with relaxing proportional odds assumption
ordinal.clm.ur <- clm(apply ~ 1 ,nominal = ~pared , data = dat)
summary(ordinal.clm.ur)

## formula: apply ~ 1
## nominal: ~pared
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -361.39 730.77 5(0)  7.67e-10 2.0e+01
## 
## Threshold coefficients:
##                                         Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept)      0.3783     0.1109   3.411
## somewhat likely|very likely.(Intercept)   2.4407     0.2007  12.164
## unlikely|somewhat likely.paredattend     -1.1438     0.2925  -3.910
## somewhat likely|very likely.paredattend  -1.0937     0.3704  -2.953

anova(ordinal.clm.ur, ordinal.clm)

## Likelihood ratio tests of cumulative link models:
##  
##                formula:      nominal: link: threshold:
## ordinal.clm    apply ~ pared ~1       logit flexible  
## ordinal.clm.ur apply ~ 1     ~pared   logit flexible  
## 
##                no.par    AIC  logLik LR.stat df Pr(>Chisq)
## ordinal.clm         3 728.79 -361.40                      
## ordinal.clm.ur      4 730.77 -361.39  0.0171  1     0.8961

Package brant is another way to test the PO assumption, using function brant() to conduct the Brant test of the proportional odds assumption. To use this function we need to run our model using polr.

#brant test for proportional odds assumption over polr object
library(brant)
brant::brant(m.plr)

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      0.02    1   0.9
## paredattend  0.02    1   0.9
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

3.6 Exercise, Ordinal logistic regression of `apply` on `pared`, `public` and `gpa`

Now run the ordinal logistic regression of apply regressed on pared, public and gpa score.

First we make variable public as factor

dat$public <- factor(dat$public, labels = c("private", "public"))

#run the clm model
m.clm <- clm(apply ~ pared + public + gpa , data = dat)
summary(m.clm)

## formula: apply ~ pared + public + gpa
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -358.51 727.02 5(0)  1.63e-10 1.3e+03
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## paredattend   1.04766    0.26579   3.942 8.09e-05 ***
## publicpublic -0.05868    0.29786  -0.197   0.8438    
## gpa           0.61575    0.26063   2.363   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##                             Estimate Std. Error z value
## unlikely|somewhat likely      2.2033     0.7795   2.826
## somewhat likely|very likely   4.2988     0.8043   5.345

The estimated model can be written in the form of equation (3.3):

$\begin{eqnarray} logit (\hat{P}(Y \le 1)) & = & 2.20 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \\ logit (\hat{P}(Y \le 2)) & = & 4.30 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \end{eqnarray}$

Exponentiate coefficients to get Odds ratios:

#getting CI
(ci <- confint(m.clm))

##                   2.5 %    97.5 %
## paredattend   0.5281776 1.5721696
## publicpublic -0.6521908 0.5191405
## gpa           0.1076197 1.1309092

# exponantiate to get OR and CI
exp(cbind(OR=-coef(m.clm)[3:5], inv.OR = coef(m.clm)[3:5], ci))

##                     OR    inv.OR     2.5 %   97.5 %
## paredattend  0.3507563 2.8509825 1.6958390 4.817088
## publicpublic 1.0604388 0.9430059 0.5209033 1.680583
## gpa          0.5402378 1.8510366 1.1136241 3.098472

For students whose parents attend college, the odds of being more likely to apply (i.e., very likely versus unlikely or somewhat likely) is 2.85 times that of students whose parents did go to college, holding constant all other variables.
For students who are going to public school, the odds of unlikely apply (i.e., unlikely versus somewhat likely and very likely) is about $6\%$ more than students going to private school, holding all other variables constant.
For one unit increase in GPA on average the odds of unlikely to apply (i.e., unlikely versus somewhat likely and very likely) is decreases by about $46\%$ , holding all other variables constant.

testing proportional assumption:

#re-run the model with proportional odds assumption relaxed

m.clm.relaxed <- clm(apply ~ 1, nominal = ~ pared + public + gpa , data = dat)
summary(m.clm.relaxed)

## formula: apply ~ 1
## nominal: ~pared + public + gpa
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -356.51 729.01 5(0)  5.67e-08 2.7e+03
## 
## Threshold coefficients:
##                                          Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept)       2.1139     0.8039   2.630
## somewhat likely|very likely.(Intercept)    4.7213     1.3923   3.391
## unlikely|somewhat likely.paredattend      -1.0831     0.2959  -3.660
## somewhat likely|very likely.paredattend   -0.9947     0.3741  -2.659
## unlikely|somewhat likely.publicpublic      0.2307     0.3063   0.753
## somewhat likely|very likely.publicpublic  -0.5367     0.4293  -1.250
## unlikely|somewhat likely.gpa              -0.5921     0.2690  -2.201
## somewhat likely|very likely.gpa           -0.7190     0.4537  -1.585

#likelihood ratio test with anova
anova(m.clm, m.clm.relaxed)

## Likelihood ratio tests of cumulative link models:
##  
##               formula:                     nominal:              link:
## m.clm         apply ~ pared + public + gpa ~1                    logit
## m.clm.relaxed apply ~ 1                    ~pared + public + gpa logit
##               threshold:
## m.clm         flexible  
## m.clm.relaxed flexible  
## 
##               no.par    AIC  logLik LR.stat df Pr(>Chisq)
## m.clm              5 727.02 -358.51                      
## m.clm.relaxed      8 729.01 -356.51  4.0138  3       0.26

#brant test for proportional odds assumption over polr object
#run the model using polr
m1.plr <- polr(apply ~ pared + public + gpa, data = dat)
#run the brant test
brant::brant(m1.plr)

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      4.34    3   0.23
## paredattend  0.13    1   0.72
## publicpublic 3.44    1   0.06
## gpa      0.18    1   0.67
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

3.7 Ploting predicted probabiliy for ordinal logistic regression

Once we are done assessing whether the assumptions of our model hold, we can obtain predicted probabilities, which are usually easier to understand than either the coefficients or the odds ratios. For example, we can vary gpa for each level of pared and public and calculate the probability of being in each category of apply. We do this by creating a new dataset of all the values to use for prediction.

#creating new data
newdat <- data.frame(
  pared = rep(levels(dat$pared), 200),
  public = rep(levels(dat$public), each = 200),
  gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4))
#using function predict
newdat <- cbind(newdat, predict(m.clm, newdat, type = "prob"))

head(newdat)

##        pared  public      gpa fit.unlikely fit.somewhat likely fit.very likely
## 1 not attend private 1.900000    0.7375757           0.2204915      0.04193277
## 2     attend private 1.921212    0.4931707           0.3945956      0.11223368
## 3 not attend private 1.942424    0.7324882           0.2245169      0.04299488
## 4     attend private 1.963636    0.4866429           0.3984942      0.11486295
## 5 not attend private 1.984848    0.7273385           0.2285788      0.04408265
## 6     attend private 2.006061    0.4801195           0.4023348      0.11754565

To plot the predicted probabilities we need to reshape it to the long format

newdat.long <- newdat %>% 
  pivot_longer(cols = c(4,5,6), names_to = "apply", values_to = "Probability",
               names_pattern ="....(.*)")

head(newdat.long)

## # A tibble: 6 × 5
##   pared      public    gpa apply           Probability
##   <chr>      <chr>   <dbl> <chr>                 <dbl>
## 1 not attend private  1.9  unlikely             0.738 
## 2 not attend private  1.9  somewhat likely      0.220 
## 3 not attend private  1.9  very likely          0.0419
## 4 attend     private  1.92 unlikely             0.493 
## 5 attend     private  1.92 somewhat likely      0.395 
## 6 attend     private  1.92 very likely          0.112

#plot of predicted probability vs. gpa separated by parent education and public
ggplot(newdat.long, aes(x = gpa, y = Probability, colour = apply)) +
  geom_line() + facet_grid(pared ~ public, labeller="label_both")

The plot above shows the trend of predicted probability of each level of outcome against GPA split by parent education and school type.

Beyond Logistic Regression in R

1 Introduction

1.1 Regression

1.2 Categorical random variables

1.2.1 Regression models of categorical variables

1.3 Working with categorical variables in R: `factor` objects

1.3.1 Optional

1.3.2 Ordered `factor`s

1.4 hsb data

1.5 Understanding odds and odds ratios

1.5.1 Odds vs. probability

1.5.2 Calculation probability from odds in multinomial (optional)

1.5.3 Odds ratios (binary variable)

1.5.4 Odds Ratios (multinomial variable)

1.6 Logit link

1.7 Binary logistic model

1.7.1 Calculating predicted probability using odds (Optional)

1.8 Logistic regression in R

1.8.1 Logistic regression example: `honors` predicted by `math` and `female`

1.8.2 Interpreting coefficients

2 Multinomial logistic regression

2.1 Multinomial logistic model

2.1.1 Calculating predicted probability using odds (Optional)

2.2 R example of multinomial logistic regression

2.3 Interpreting the coefficients of multinomial logistic

2.4 Test statistics in Multinomial logistic regression

2.5 Ploting predicted value in multinomial logistic regression

2.6 Package `mlogit`

2.7 Exercise for multinomial logistic regression:

3 Ordinal logistic Regression

3.1 Ordinal logistic model

3.2 R example for ordinal logistic regression

3.3 Interpreting the result of ordinal logistic regression

3.4 Using polr in package MASS

3.5 Testing propotional odds assumption

3.6 Exercise, Ordinal logistic regression of `apply` on `pared`, `public` and `gpa`

3.7 Ploting predicted probabiliy for ordinal logistic regression

Beyond Logistic Regression in R

1 Introduction

1.1 Regression

1.2 Categorical random variables

1.2.1 Regression models of categorical variables

1.3 Working with categorical variables in R: factor objects

1.3.1 Optional

1.3.2 Ordered factors

1.4 hsb data

1.5 Understanding odds and odds ratios

1.5.1 Odds vs. probability

1.5.2 Calculation probability from odds in multinomial (optional)

1.5.3 Odds ratios (binary variable)

1.5.4 Odds Ratios (multinomial variable)

1.6 Logit link

1.7 Binary logistic model

1.7.1 Calculating predicted probability using odds (Optional)

1.8 Logistic regression in R

1.8.1 Logistic regression example: honors predicted by math and female

1.8.2 Interpreting coefficients

2 Multinomial logistic regression

2.1 Multinomial logistic model

2.1.1 Calculating predicted probability using odds (Optional)

2.2 R example of multinomial logistic regression

2.3 Interpreting the coefficients of multinomial logistic

2.4 Test statistics in Multinomial logistic regression

2.5 Ploting predicted value in multinomial logistic regression

2.6 Package mlogit

2.7 Exercise for multinomial logistic regression:

3 Ordinal logistic Regression

3.1 Ordinal logistic model

3.2 R example for ordinal logistic regression

3.3 Interpreting the result of ordinal logistic regression

3.4 Using polr in package MASS

3.5 Testing propotional odds assumption

3.6 Exercise, Ordinal logistic regression of apply on pared, public and gpa

3.7 Ploting predicted probabiliy for ordinal logistic regression

1.3 Working with categorical variables in R: `factor` objects

1.3.2 Ordered `factor`s

1.8.1 Logistic regression example: `honors` predicted by `math` and `female`

2.6 Package `mlogit`

3.6 Exercise, Ordinal logistic regression of `apply` on `pared`, `public` and `gpa`