Office of Advanced Research Computing (OARC), Statistical Methods and Data Analytics
In this workshop we will go over multinomial and ordinal logistic regression with examples in R.
Regression analysis is a statistical tool used to explain the relationship between a response (dependent, outcome) variable as a function of one or more predictor (independent) variables.
In linear regression we assume this relationship can be explained with a linear function.
If the dependent variable is a continuous variable we can use the ordinary least squares (OLS) method to estimate the linear model.
Under the following assumptions, the OLS estimator has the lowest sampling variance within the class of linear unbiased estimators.
Errors have mean equal to zero
Errors are independent
Errors have equal or constant variance (homoscedasticity or homogeneity of variance)
With an additional assumption, OLS is the maximum likelihood estimator
Errors follow a normal distribution
What if the dependent variable is not continuous?
A categorical variable is a variable that only takes one of a limited number of possible values, usually signifying groups, based on some underlying quantitative variable(s).
When there is no natural order or rank between groups, we call it a nominal categorical variable.
Blood type, gender, race, and zip code are examples of nominal categorical variables.
For example:
People can respond to a survey with strongly disagree, disagree, agree, and strongly agree. We cannot quantify how much bigger or smaller the distance between strongly agree and agree is compared to the distance between agree and disagree.
Socioeconomic status can be categorize as “low”, “middle”, and “high”. - A categorical variable with only two possible outcomes is a binary variable or Bernoulli variable.
For example:
In flipping a coin, the outcome must be head or tails.
A medical test that has only two possible outcome: positive or negative.
Smoking is a binary variable with outcome “Yes” or “No”.
If the dependent variable is a binary categorical variable we use logistic regression or more specifically, binary logistic regression
If the dependent variable is a nominal categorical variable with more than two category we use multinomial logistic regression
If the dependent variable is ordinal then we use ordinal logistic regression
The term logistic comes from using the logistic function as the link function, to transform a linear combination of coefficients and predictor variables that can take values from \(-\infty\) to \(\infty\) to probabilities.
In binomial, multinomial, and ordinal logistic regression we use logistic functions as the link function. We will discuss logistic functions later.
factor
objectsfactor
to store and manipulate
categorical variables. Some functions in R automatically treat variable
type character
as factor
, but it is always
safer to change the variable’s class to factor
before the
analysis.We begin by creating a factor
variable in R.
Let’s first create a character
vector containing values
“a”, “b”, “c”, and “d”.
## [1] "a" "b" "c" "d"
## [1] "character"
Function class()
determines the class of an object.
We use function factor()
to transfer x
to a
factor
.
## [1] a b c d
## Levels: a b c d
## [1] "factor"
By default, levels
are the unique set of values taken by
as.character(x)
, sorted in ascending order. For each level
we can applylabels
, which by default are the same as levels
themselves. In the R-Studio Environment pane, we can observe the
numerical values that represent each level.
The first level in a factor
is the reference level.
Using function relevel()
we can change the reference.
## [1] a b c d
## Levels: b a c d
## [1] a b c d
## Levels: a b c d
Because we did not assign the output of relevel()
to an
object, we did not permanently change the values of factor
x.f
. If we want to permanently change the reference level,
we need to assign the result to a new object or the factor
x.f
itself.
## [1] a b c d
## Levels: b a c d
If we want to combine two groups into one group, we can just set two
levels of an existing factor
variable to be the same level
inside levels()
. Notice the order on the right hand side
corresponds to the ordering of the levels of the original
factor
x.f
:
# Map from 4 different values to three levels (combine c and d into c):
x.f2 <- x.f
levels(x.f2) <- c("b", "a", "c", "c")
x.f2
## [1] a b c c
## Levels: b a c
## [1] a b c c
## Levels: b a c
We need to be cautious when we use the assignment operator
<-
. Above, not only did we change the levels of the
factor, but also we have changed the values of vector to just 3 values.
This is why we first stored a copy of the original vector.
factor
sNow we are going to create an ordered factor, which stores the
ordering of the levels. We can use function factor
with
option ordered = TRUE
.
## [1] a b c d
## Levels: a < b < c < d
To change the order of ordered factor
we can re-specify
the levels in the order we want.
## [1] a b c d
## Levels: d < b < c < a
We can also change the labels of ordered factor
s.
## [1] strong disagree disagree agree strongly agree
## Levels: strong disagree < disagree < agree < strongly agree
## [1] strong disagree disagree agree strongly agree
## Levels: strongly agree < agree < disagree < strong disagree
The first line above labels “a” with “strongly disagree”, “b” with “disagree”, “c” with “agree”, and “d” with “strongly agree”.
The second line above changes the order of the levels, but the values
remain the same. Since x.of
is already an ordered
factor
we do not need to use argument
order = TRUE
.
Let’s now read the first data set. The data contains academic performance and demographic information about 200 high school students. This data set is hypothetical and was created to demonstrate statistical analyses.
## [1] "id" "female" "ses" "schtyp" "prog" "read" "write"
## [8] "math" "science" "socst" "honors" "awards" "cid"
## 'data.frame': 200 obs. of 13 variables:
## $ id : int 45 108 15 67 153 51 164 133 2 53 ...
## $ female : chr "female" "male" "male" "male" ...
## $ ses : chr "low" "middle" "high" "low" ...
## $ schtyp : chr "public" "public" "public" "public" ...
## $ prog : chr "vocation" "general" "vocation" "vocation" ...
## $ read : int 34 34 39 37 39 42 31 50 39 34 ...
## $ write : int 35 33 39 37 31 36 36 31 41 37 ...
## $ math : int 41 41 44 42 40 42 46 40 33 46 ...
## $ science: int 29 36 26 33 39 31 39 34 42 39 ...
## $ socst : int 26 36 42 32 51 39 46 31 41 31 ...
## $ honors : chr "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
## $ awards : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cid : int 1 1 1 1 1 1 1 1 1 1 ...
Variables female
, schtyp
(school type),
prog
(program), honors
(honors program), and
ses
(socioeconomic status) are categorical variables.
Variable ses
is an ordinal variable with levels “low”,
“middle”, and “high”.
First we transform variable female
to a
factor
and calculate frequencies and proportions of females
and males.
## [1] "female" "male"
##
## female male
## 109 91
##
## female male
## 0.545 0.455
Now we change the reference level form “female” to “male”
#changing the reference to "male"
hsb$female <- relevel(hsb$female, ref = "male")
#check to see reference changed
levels(hsb$female)
## [1] "male" "female"
1- In hsb data transform variables schtyp
(school type),
prog
(program), honors
(honors program), and
ses
(socioeconomic status) to unordered factor
variables and report the proportions of each category for those
variables.
#transform variables schtyp, prog, honors and ses to factor
hsb$schtyp <- factor(hsb$schtyp)
prop.table(table(hsb$schtyp))
##
## private public
## 0.16 0.84
##
## academic general vocation
## 0.525 0.225 0.250
##
## enrolled not enrolled
## 0.265 0.735
#ses as factor in order or "low", "middle", "high". Not ordered factor.
hsb$ses <- factor(hsb$ses, levels = c("low", "middle", "high"))
prop.table(table(hsb$ses))
##
## low middle high
## 0.235 0.475 0.290
2- Change the reference of variable honors
to “not
enrolled”
Next, we review some important concepts in logistic regression.
The odds of an event with probability \(p\) is defined as: \[ \tt{odds(p)} = \dfrac{p}{1 -p} \]
Suppose the probability of a student enrolling in the honors program on average is 0.3.
Then we define the odds of enrolled to be:
\[ \tt{odds(enrolled)} = \dfrac{\Pr(enrolled)}{1 - \Pr(enrolled)} = \dfrac{0.3}{1 - 0.3} = 0.4285714 \]
\[ \tt{odds(not \ enrolled)} = \dfrac{\Pr(not \ enrolled)}{1 - \Pr(not \ enrolled)} = \dfrac{0.7}{1 - 0.7} = 2.333333 \]
\[ p = \dfrac{\tt{odds}}{1 + \tt{odds}} = \dfrac{2}{1 + 2} = \dfrac{2}{3} \approx 0.67 \]
For categorical variables with multiple categories we generalize the
definition of odds to be the relative probability, or odds, of one level
vs. the reference level (sometimes called relative risk). For example in
the program variable (prog
), which has 3 levels,
“academic”, “general”, “vocational”, we can define the odds of being in
the general program vs. the academic program.
\[ \tt{odds(general \ vs. \ academic)} = \dfrac{\Pr(general)}{\Pr(academic)} \]
R
using probabilities calculated
by table()
.#first we create frequency table and
#then we use prop.table to calculate probabilities
prog.tab <- table(hsb$prog)
(prog.p <- prop.table(prog.tab))
##
## academic general vocation
## 0.525 0.225 0.250
## [1] 0.4285714
## [1] 0.4761905
Since,
\[\begin{eqnarray} \dfrac{1 - \Pr(\tt{academic}) }{\Pr(\tt{academic})} &=& \dfrac{\Pr(\tt{general}) + \Pr(vocational)}{\Pr(\tt{academic})} \\ &=& \tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)} \end{eqnarray}\]
Thus,
\[\begin{eqnarray} \Pr(\tt{academic}) &=& \dfrac{1}{1 +\tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)}}, \\ \Pr(\tt{general}) &=& \Pr(\tt{academic}) \times \tt{odds(general \ vs. \ academic)}, \tt{and} \\ \Pr(\tt{vocational}) &=& \Pr(\tt{academic}) \times \tt{odds(vocational \ vs. \ academic)}.\\ \end{eqnarray}\]
For example the odds ratio of enrollment in the honors program for females to males is \[\text{OddsRatio(enrolled for female to male)} =\dfrac{\tt{odds(enrolled \ | \ female)}}{\tt{odds(enrolled \ | \ male)}}\].
honors
vs. female
.#calculating proportion of enrollment for each group of female and male.
#table gives frequencies
(tab_honors_female <- table(hsb$honors, hsb$female))
##
## male female
## not enrolled 73 74
## enrolled 18 35
#prop.table on frequency table with margin = 2 returns column-vise proportions.
prop.table(tab_honors_female, margin = 2)
##
## male female
## not enrolled 0.8021978 0.6788991
## enrolled 0.1978022 0.3211009
## [1] 1.918168
Odds of enrollment in the honors program for a female student is almost twice the odds for a male student. Or, the odds of enrollment for female students is about 92% more than male students.
We could also use the frequencies directly to calculate odds and odds ratios since the total number of females and males will cancel in the numerator and denominator.
## [1] 1.918168
-Note: Calculating probability from odd-ratio is only possible if we know odds of at least one group. Otherwise, it is not possible to calculate probabilities from the odds ratio.
For categorical variable with more than two categories, we define a generalized form of odds ratio as: the ratio of odds of a category vs. a reference category for one group, to the odds of the same category vs. the reference category for another group.
Sometimes this is called a relative risk ratio.
For example the odds ratio comparing females to males for the odds of “general” vs. “academic” (the reference category) is:
\[ \text{OddsRatio(general vs. academic for female to male)} = \dfrac{\left[\dfrac{\Pr (\text{general | female})}{\Pr(\text{academic | female})}\right]}{\left[\dfrac{\Pr (\text{general | male})}{\Pr(\text{academic | male})}\right]}\]
The above can also be expressed as the ratio of “the relative risks of general vs. academic program” comparing females to males.
Below, we use the hsb data to construct a 3 by 2 table of program vs. female to calculate the odds ratio of general vs. academic comparing females to males.
##
## male female
## academic 47 58
## general 21 24
## vocation 23 27
## [1] 0.9261084
We can say that the odds of general program vs. academic program for female student is \(0.926\) times less than the odds of general program vs. academic program for male students.
Or, we can say the odds of general program vs. academic program for female student is \(7.4\%\) less than the odds of general program vs. academic program for male students.
All logistic regression models (binary, multinomial, and ordinal) can model the probability of being in each level of the outcome. The goal is to predict (model) this probability, restricted between \(0\) and \(1\), by linear combinations of independent variables. However, we want to make it possible that a linear combinations of predictors can take on any value between \(-\infty\) and \(+\infty\).
We use the logit function to transform a value between 0 and 1 to a value between \(-\infty\) and \(+\infty\).
The inverse of the logit function, the logistic function, maps a value between \(-\infty\) and \(+\infty\) to interval the \((0, 1)\).
The logit function transforms probabilities to log-odds:
\[ \tt{logit} (P) =
\log\left(\dfrac{P}{1-P}\right) \] - In regression, a function
that map the expected (mean) of the outcome to all possible linear
predictions (real numbers) is called link
function.
The chart below shows how the logit function transforms a range from \((0, 1)\) to \((-\infty, +\infty)\):
\[\begin{equation} y | x_1, x_2,\cdots, x_p = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p + \epsilon \tt{\ \ (1.1)} \end{equation}\]
or
\[\mu_{y | x_1, x_2,\cdots, x_p} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p \tt{\ \ \ \ \ (1.2)}\]
\(\mu_y\) is the mean of dependent variable for specific values of independent variables. \(\epsilon\) is the unexplained error term or uncertainty of the model that has mean equal to 0.
\[ \log(odds | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p \tt{\ \ \ \ \ (1.3)}\]
or,
\[\frac{Pr(y = 1| x_1, x_2 , \cdots x_p)}{Pr(y = 0| x_1, x_2 , \cdots x_p)} = \exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + ... + \beta_p x_p)} \tt{\ \ \ \ \ (1.4)}\]
solving for \(\Pr(y = 1)\) given that \(\Pr(y = 0) = 1 - \Pr(y = 1)\) we have,
\[Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}}{1+\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}} \tt{\ \ \ \ \ (1.5)}\]
or,
\[Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp[{-(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}]} \tt{\ \ \ \ \ (1.6)}\] and
\[Pr(y = 0 | x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp\left[(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)\right]} .\]
glm()
for logistic regression
with family=binomial()
and link="logit"
.The syntax of glm
is as follows:
glm(formula, family = binomial(link = "logit"), data = data, weights, subset, na.action, ...)
The first argument is the model formula, followed by defining distribution of the outcome.
honors
predicted by
math
and female
For this analysis we use data set hsb
to model
enrollment to honor program as the outcome and variables
math
and female
as independent variables. Our
model assumes the effect of math score is the same for females and males
(i.e., no interactions between math
and
female
).
summary
().coefficient()
and
confint()
can be used to extract coeffcients and estimate
confidence intervals, respectively.#to estimate coefficient of the model we use glm with
#the option family = binomial(link = "logit")
logit.m <- glm(honors ~ math + female, family = binomial(link = "logit"), data = hsb)
summary(logit.m)
##
## Call:
## glm(formula = honors ~ math + female, family = binomial(link = "logit"),
## data = hsb)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0188 -0.6676 -0.3110 0.4119 2.5408
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.79228 1.71892 -6.860 6.87e-12 ***
## math 0.18254 0.02842 6.423 1.34e-10 ***
## femalefemale 1.12085 0.42403 2.643 0.00821 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 231.29 on 199 degrees of freedom
## Residual deviance: 158.46 on 197 degrees of freedom
## AIC: 164.46
##
## Number of Fisher Scoring iterations: 5
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.792281 1.71891936 -6.860288 6.872196e-12
## math 0.182538 0.02842085 6.422678 1.338976e-10
## femalefemale 1.120847 0.42403214 2.643308 8.210041e-03
## 2.5 % 97.5 %
## (Intercept) -15.4710305 -8.689254
## math 0.1309423 0.243122
## femalefemale 0.3143782 1.986565
\[ \log(\text{odds of enrolled}) = \log\left[\dfrac{\Pr (\text{enrolled})}{\Pr(\text{not enrolled})}\right]= -11.792281 + 0.182538 \text{ math} + 1.120847 \text{ female}\]
math
:For one unit increase of math score the log-odds of enrolled is expected to increase by 0.182538, holding other predictors (female) constant.
female
:The log-odds of enrolled for a female student is 1.120847 greater than the log-odds of enrolled for a male student on average.
The intercept is the expected log-odds of enrolled for a male
student (female
=0) when math
=0.
Since the math
=0 is unrealistic, it is not useful to
interpret this intercept.
If we exponentiate the coefficients with exp()
, then we
can interpret coefficients in the odds ratio metric.
## (Intercept) math femalefemale
## 7.562706e-06 1.200260e+00 3.067452e+00
We can also use exp()
with confint()
to
calculate the confidence intervals of the odds ratio.
#exponentiate coefficients and column bind with exponentiation of CIs
cbind(OR = exp(coef(logit.m)), exp(confint(logit.m)))
## OR 2.5 % 97.5 %
## (Intercept) 7.562706e-06 1.909927e-07 0.0001683856
## math 1.200260e+00 1.139902e+00 1.2752241647
## femalefemale 3.067452e+00 1.369407e+00 7.2904468215
The above confidence interval (CI) is based on the profile likelihood
method. If you want to calculate CIs based on standard errors you can
use confint.default
instead.
The intercept, \(7.562706
e-06\), is the expected odds of enrolled for a male student with
math
=0.
For one unit increase in math score, the odds of enrolled in honors on average increases by \(20\%\).
The odds of enrolled for females is on average about \(3.067\) times more than males, holding
math
constant.
Suppose we want to model a regression with a categorical outcome with \(J\) number of categories and the \(J\)th level is the reference category.
For each category \(j\) from \(1, 2, \cdots, J-1\) the log-odds of category \(j\) versus the reference category \(J\) given predictors \(x_1, x_2, \cdots, x_p\) is modeled as the form below:
\[ \log(\text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p)) =\log \left[ \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} \right] = \beta_{0j} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p \tt{\ \ \ \ \ (2.1)}\]
where \(odds_j\) is “generalized” odds and \[Pr(y = J) = 1 - \sum_{j=1}^{J-1} Pr(y = j).\]
We can exponentiate both sides of equation (2.1):
\[ \text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p) = \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \exp(\beta_{j0} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p)\tt{\ \ \ \ \ \ (2.2)}\] - In multinomial logistic regression, for an outcome with 3 levels we need to estimate \((p + 1) * 2\) coefficients and for an outcome with 4 levels we estimate \((p + 1) * 3\) coefficients.
It is not practical to run a multinomial logistic regression for an outcome with too many levels.
The complete or quasi-complete separation problem is very common in multinomial logistic regression with small samples. For more information about complete and quasi-complete separation, can see this page:
How Relevant is the Independence of Irrelevant Alternatives?
We can now calculate the probability of the reference level using the odds by summing both sides over all \(j\) from \(1, 2, \cdots, J-1\),
\[ \dfrac{1 - \Pr(y = J|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) \]
Solving for \(Pr(y = J)\) we get:
\[ \Pr(y = J|x_1, x_2, \cdots, x_p) = \dfrac{1}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.3)} \]
To get the probability of the other \(j\) levels from \(1, 2, \cdots, J-1\) we just need to multiply the odds of each level by probability of the reference level thus,
\[ \Pr(y = j|x_1, x_2, \cdots, x_p) = \dfrac{\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p)}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.4)} \]
We use the multinom()
function from the
nnet
package to estimate a multinomial logistic regression
model.
For the first example, suppose we want to model students’
membership to different programs (variable prog
) as
predicted by write
(writing score) and ses
(socioeconomic status), using the hsb
data.
The outcome variable prog
has 3 levels, “academic”,
“general”, “vocation” with the reference group “academic”(the \(J\)th category).
For help documentation about the multinom
() function,
see ?multinom
. For this workshop we only use the basic
arguments of the multinom()
function.
##
## academic general vocation
## 105 45 50
##
## academic general vocation
## 0.525 0.225 0.250
#running multinomial regression model for prog on socio econonic status and write
library(nnet)
m.multinom <- multinom(prog ~ ses + write, data = hsb)
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 179.982880
## final value 179.981726
## converged
The model runs and information is printed about convergence process. Output includes some iteration history and the final negative log-likelihood (179.981726). This value multiplied by two is then seen in the model summary as the Residual Deviance and can be used in comparisons of nested models.
The result is stored in a multinom
class object and
to extract the results, we again use generic function
summary()
.
## Call:
## multinom(formula = prog ~ ses + write, data = hsb)
##
## Coefficients:
## (Intercept) sesmiddle seshigh write
## general 2.85 -0.533 -1.163 -0.0579
## vocation 5.22 0.291 -0.983 -0.1136
##
## Std. Errors:
## (Intercept) sesmiddle seshigh write
## general 1.17 0.444 0.514 0.0214
## vocation 1.16 0.476 0.596 0.0222
##
## Residual Deviance: 359.9635
## AIC: 375.9635
The first output line is the model formula.
If we substitute the model results into equation 2.1 we have:
\[\begin{eqnarray} \log\left[\dfrac{\Pr(\tt{general})}{\Pr(\tt{academic})}\right] = 2.85 -0.533 \times\text{(ses = middle)} -1.163\times\text{(ses = high)} -0.0579\times \tt{write}, \\ \\ \log\left[\dfrac{\Pr(\tt{vocation})}{\Pr(\tt{academic})}\right] = 5.22 + 0.291 \times\text{(ses = middle)} - 0.983 \times\text{(ses = high)} - 0.1136 \times \tt{write}. \\ \end{eqnarray}\]
Intercepts:
The expected log-odds of being in general vs. academic is equal to 2.85 for a person with low socio economic status and writing score equal to zero.
The expected log-odds of being in vocation vs. academic is equal to 5.22 for a person with low socio economic status and writing score equal to zero.
For independent variable write
:
A one-unit increase in write
decreases log-odds of
being in the general vs. academic program by \(.058\) on average.
A one-unit increase in the variable write
decreases
the log-odds of being in the vocation vs. academic program by \(0.1136\) on average.
For ses
:
The log-odds of being in the general program vs. academic program for ses=“high” is lower than ses=“low” by \(1.163\).
The log odds of being in the general program vs. academic program will decrease by \(0.533\) if moving from ses= “low” to ses=“middle”.
The log odds of being in the vocation program vs. academic program will decrease by \(0.983\) if moving from ses=“low” to ses=“high”.
The log odds of being in the vocation program vs. academic program will increase by \(0.291\) if moving from ses=“low” to ses=“middle”.
We can exponentiate the coefficients from our model to get coefficients in term of the odds ratios.
## (Intercept) sesmiddle seshigh write
## general 17.32582 0.5866769 0.3126026 0.9437172
## vocation 184.61262 1.3382809 0.3743123 0.8926116
The expected odds of being in general vs. academic is equal to \(17.32582\) for a person with low socio economic status and writing score equal to zero.
The expected odds of being in vocation vs. academic is equal to \(184.61262\) for a person with low socio economic status and writing score equal to zero.
The odds of being in general program vs. academic program is
expected to decrease by a factor of \(.9437\), for one-unit increase in the
variable write
, keeping socio economic status
constant.
The odds of being in vocational program vs. academic program is
expected to decrease by a factor of \(0.8926116\), for one-unit increase in the
variable write
, keeping socio economic status
constant.
The odds for being in general program vs. academic program
expected to decreases by a factor of \(.3126\), switching from ses
=
low to high, keeping writing score constant.
The odds for being in general program vs. academic program on
average decreases by \(41.3\%\),
switching from ses
= low to middle, keeping writing score
constant.
The odds for being in vocational program vs. academic program
expected to decreases by a factor of \(0.3743\), switching from ses
=
low to high, keeping writing score constant.
The odds for being in vocational program vs. academic program on
average increases by \(33.8\%\),
switching from ses
= low to middle, keeping writing score
constant.
summary()
.#calculating z score by deviding estimated coefficient by it's standard error
z <- m.results$coefficients/m.results$standard.errors
#using pnorm we get probability of tail of normal distribution for z value
p <- (1 - pnorm(abs(z), 0, 1)) * 2
#print the result of z-scores and p-values
print(z,digits = 4)
## (Intercept) sesmiddle seshigh write
## general 2.445 -1.2018 -2.261 -2.706
## vocation 4.485 0.6117 -1.650 -5.113
## (Intercept) sesmiddle seshigh write
## general 1.448e-02 0.2294 0.02374 6.819e-03
## vocation 7.299e-06 0.5408 0.09895 3.176e-07
To test overall effect of variable ses
we can use
likelihood ratio test.
We run the model excluding variable ses
and the we
conduct likelihood ratio test using function
anova()
.
## # weights: 9 (4 variable)
## initial value 219.722458
## final value 185.510837
## converged
## Likelihood ratio tests of Multinomial Models
##
## Response: prog
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 write 396 371.0217
## 2 ses + write 392 359.9635 1 vs 2 4 11.05822 0.02591741
ses
on
program
controlling for writing score is unlikely by
chance.We can also use predicted probabilities to help us understand the
model. We can calculate predicted probabilities for each of our outcome
levels using the fitted()
function.
We can start by generating the predicted probabilities for the observations in our data and viewing the first few rows.
#fitted gives fitted probability for observations, head report the first 6 rows
head(pp <- fitted(m.multinom))
## academic general vocation
## 1 0.1482764 0.3382454 0.5134781
## 2 0.1202017 0.1806283 0.6991700
## 3 0.4186747 0.2368082 0.3445171
## 4 0.1726885 0.3508384 0.4764731
## 5 0.1001231 0.1689374 0.7309395
## 6 0.3533566 0.2377976 0.4088458
It will be more helpful to examine the changes in predicted
probability associated with one of our two variables. We can create
small datasets varying one variable while holding the other constant. We
will first do this by holding write
at its mean and
examining the predicted probabilities for each level of
ses
.
#new data with all levels of ses and write at the mean of write.
newdata.ses <- data.frame(ses = c("low", "middle", "high"), write = mean(hsb$write))
#column bind the new data with predicted probabilities
cbind(newdata.ses, probability = predict(m.multinom, newdata = newdata.ses, "probs"))
## ses write probability.academic probability.general probability.vocation
## 1 low 52.775 0.4396845 0.3581917 0.2021238
## 2 middle 52.775 0.4777488 0.2283353 0.2939159
## 3 high 52.775 0.7009007 0.1784939 0.1206054
Another way to understand the model using the predicted probabilities
is to look at the averaged predicted probabilities for different values
of the continuous predictor variable write
, for example
from 30 to 70 within each level of ses
. (It is suggested to
check the range of continuous variable first)
## [1] 31 67
#in new data, for write from 30 to 40 we replicate levels of ses
newdata.write <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),
3))
#predict gives predicted probability for each rows of newdata
pp.write <- cbind(newdata.write, predict(m.multinom, newdata = newdata.write, type = "probs", se = TRUE))
#first few rows
head(pp.write)
## ses write academic general vocation
## 1 low 30 0.09843588 0.2999880 0.6015762
## 2 low 31 0.10716868 0.3082195 0.5846118
## 3 low 32 0.11650390 0.3162093 0.5672868
## 4 low 33 0.12645834 0.3239094 0.5496323
## 5 low 34 0.13704576 0.3312711 0.5316831
## 6 low 35 0.14827643 0.3382454 0.5134781
#calculate the mean probabilities within each level of ses
by(pp.write[, 3:5], pp.write$ses, colMeans)
## pp.write$ses: high
## academic general vocation
## 0.6164315 0.1808037 0.2027648
## ------------------------------------------------------------
## pp.write$ses: low
## academic general vocation
## 0.3972977 0.3278174 0.2748849
## ------------------------------------------------------------
## pp.write$ses: middle
## academic general vocation
## 0.4256198 0.2010864 0.3732938
To plot the predicted probability, first we need to make the
predicted probability data set in a long form. We do this because we
want to create factor program
with 3 levels and all
probabilities in one column.
#pivot_longer from package tidy reshape the wide format to long format
library(tidyr)
long.pp <- pp.write %>% pivot_longer(cols = c("academic", "general", "vocation"), names_to = "Program", values_to = "Probability")
#fist few rows will be
head(long.pp)
## # A tibble: 6 × 4
## ses write Program Probability
## <chr> <int> <chr> <dbl>
## 1 low 30 academic 0.0984
## 2 low 30 general 0.300
## 3 low 30 vocation 0.602
## 4 low 31 academic 0.107
## 5 low 31 general 0.308
## 6 low 31 vocation 0.585
Now we are ready to plot the predicted probability as function of
write
, separatly by ses
and
Program
.
mlogit
R
. For this workshop we used package nnet
because it did not required to reshape the data. Another package is
mlogit
.mlogit
package we need to reshape the data into
mlogit.data()
format. The example below shows how to
reshape the hsb
data with variable prog
as
choice option.library(mlogit)
#reshapes the data in data frame with index, dfidx
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "prog")
class(hsb.choice)
## [1] "dfidx_mlogit" "dfidx" "data.frame" "mlogit.data"
## id female ses schtyp prog read write math science socst honors
## 1 45 female low public FALSE 34 35 41 29 26 not enrolled
## 2 45 female low public FALSE 34 35 41 29 26 not enrolled
## 3 45 female low public TRUE 34 35 41 29 26 not enrolled
## 4 108 male middle public FALSE 34 33 41 36 36 not enrolled
## 5 108 male middle public TRUE 34 33 41 36 36 not enrolled
## 6 108 male middle public FALSE 34 33 41 36 36 not enrolled
## awards cid chid alt idx
## 1 0 1 1 academic 1:emic
## 2 0 1 1 general 1:eral
## 3 0 1 1 vocation 1:tion
## 4 0 1 2 academic 2:emic
## 5 0 1 2 general 2:eral
## 6 0 1 2 vocation 2:tion
#running multinomial logistic using mlogit
m.mlogit <- mlogit(prog ~ 0 | ses + write , data = hsb.choice)
#extract the result using summary
summary(m.mlogit)
##
## Call:
## mlogit(formula = prog ~ 0 | ses + write, data = hsb.choice, method = "nr")
##
## Frequencies of alternatives:choice
## academic general vocation
## 0.525 0.225 0.250
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 7.89E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):general 2.852186 1.166439 2.4452 0.014477 *
## (Intercept):vocation 5.218200 1.163549 4.4847 7.301e-06 ***
## sesmiddle:general -0.533291 0.443732 -1.2018 0.229429
## sesmiddle:vocation 0.291393 0.476374 0.6117 0.540743
## seshigh:general -1.162832 0.514219 -2.2614 0.023737 *
## seshigh:vocation -0.982670 0.595567 -1.6500 0.098948 .
## write:general -0.057928 0.021411 -2.7056 0.006819 **
## write:vocation -0.113603 0.022220 -5.1127 3.177e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -179.98
## McFadden R^2: 0.11815
## Likelihood ratio test : chisq = 48.23 (p.value = 1.063e-08)
One advantage of mlogit
is that it gives test
statistics for coefficients in the summary.
Package mlogit
can handle mixed effect multinomial
logistic model. It also designed to handle choice model.
In hsb
data First center math score to its mean and call
it c.math. Then, run a multinomial logistic regression with the
ses
variable as a nominal outcome and use variable c.math
and female as predictors, and with the interaction between c.math and
female.
Solution:
#first create variable c.math
hsb$c.math <- scale(hsb$math, center = TRUE, scale = FALSE)
#run the moltinomial regression with nnet::multinom()
ex.multinom <- multinom(ses ~ c.math + female + c.math:female, data = hsb)
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 199.706497
## final value 198.527428
## converged
## Call:
## multinom(formula = ses ~ c.math + female + c.math:female, data = hsb)
##
## Coefficients:
## (Intercept) c.math femalefemale c.math:femalefemale
## middle 1.3152562 0.07581224 -0.8768092 -0.060864602
## high 0.8092806 0.09105872 -1.0052843 0.007573346
##
## Std. Errors:
## (Intercept) c.math femalefemale c.math:femalefemale
## middle 0.3398332 0.03711554 0.4143891 0.0459392
## high 0.3609246 0.03932942 0.4582664 0.0506627
##
## Residual Deviance: 397.0549
## AIC: 413.0549
#using mlogit first we need to reshape the data
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "ses")
m.mlogit <- mlogit(ses ~ 0 | c.math * female, data = hsb.choice)
#extract the result
summary(m.mlogit)
##
## Call:
## mlogit(formula = ses ~ 0 | c.math * female, data = hsb.choice,
## method = "nr")
##
## Frequencies of alternatives:choice
## high low middle
## 0.290 0.235 0.475
##
## nr method
## 5 iterations, 0h:0m:0s
## g'(-H)^-1g = 0.00023
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):low -0.809263 0.360924 -2.2422 0.02495 *
## (Intercept):middle 0.505971 0.240321 2.1054 0.03526 *
## c.math:low -0.091058 0.039329 -2.3153 0.02060 *
## c.math:middle -0.015247 0.024738 -0.6163 0.53768
## femalefemale:low 1.005263 0.458266 2.1936 0.02826 *
## femalefemale:middle 0.128477 0.352987 0.3640 0.71588
## c.math:low -0.007574 0.050663 -0.1495 0.88116
## c.math:middle -0.068437 0.037904 -1.8055 0.07099 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -198.53
## McFadden R^2: 0.057246
## Likelihood ratio test : chisq = 24.11 (p.value = 0.00049843)
Let \(y\) be an ordinal outcome variable with \(J\) categories, ordered from 1 to J. We are defining \(\Pr(y \leq j)\) to be the cumulative probability of \(y\) less than or equal to a specific category \(j\) for \(j = 1, 2, \cdots J - 1\) and thus, \(\Pr(y \leq J) = 1\).
The odds of outcome being less than equal to a particular category will be:
\[odds(y \leq j) = \dfrac{\Pr(y \leq j)}{\Pr(y > j)} \tt{ \ \ \ \ \ \ (3.1)}\] is defined for \(j = 1, 2, \cdots J - 1\). Since \(\Pr(y \gt J) = 0\) the odds is undefined for the \(J\)th category.
The ordinal logistic regression model can be defined as
\[\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1j}x_1 + \cdots + \beta_{pj} x_p\tt{ \ \ \ \ \ \ (3.2)}\]
With the parallel lines assumption (proportional odds assumption), the intercepts are different for each category but the slopes are assumed to be constant across categories, which simplifies the equation above to
\[\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1}x_1 + \cdots + \beta_{p} x_p \tt{ \ \ \ \ \ \ (3.3)}\]
The above model is also called cumulative link model.
In R we use package ordinal
and function
clm
.
clm(formula, scale, nominal, data, weights, start, subset, doFit = TRUE, na.action, contrasts, model = TRUE, control=list(), link = c("logit", "probit", "cloglog", "loglog", "cauchit", "Aranda-Ordaz", "log-gamma"), threshold = c("flexible", "symmetric", "symmetric2", "equidistant"), ...)
polr
from package
MASS
to estimate this model. In clm
also
polr
(and also in Stata, ologit) the ordinal logistic
regression model is parameterized as\[logit (P(y \le j)) = \beta_{j0} - \eta_{1}x_1 - \cdots - \eta_{p} x_p\] and will return \(\eta_i = -\beta_i\).
For an example of ordinal logistic regression we use data “ologit.csv” from the OARC website.
Data contains variables apply
for either student
apply for grad school, pared
For parents education,
public
for type of school they have attended, and
gpa
for GPA score.
To run the ordinal logistic regression with proportional adds
assumption we use function clm
from package
ordinal
.
In the first analysis we regress variable apply
by
pared
(Parent’s education).
Variable apply
is an ordinal variable with 3 levels
0,1,2 indicating “unlikely”, “somewhat likely”, and “very likely” to
state if student apply for grad school.
pared
is a binary variable with 1 meaning the
student’s parent has higher degree and 0 meaning the parent does not
have higher degree.
#reading the data into R
dat <- read.csv("https://stats.idre.ucla.edu/stat/data/ologit.csv")
#structure of the data
str(dat)
## 'data.frame': 400 obs. of 4 variables:
## $ apply : int 2 1 0 1 1 0 1 1 0 1 ...
## $ pared : int 0 1 1 0 0 0 0 0 0 1 ...
## $ public: int 0 0 1 0 0 1 0 0 0 0 ...
## $ gpa : num 3.26 3.21 3.94 2.81 2.53 ...
First we change the variable apply
from integer to
ordinal factor and label the levels with “unlikely”, “somewhat likely”,
“very likely”.
Comparing to the model (3.3), the outcome \(j = J = 3\) is “very likely”, \(j = 2\) is “somewhat likely”, and \(j = 1\) is “unlikely”.
#transform apply to ordered factor
dat$apply <- factor(dat$apply, labels = c("unlikely", "somewhat likely", "very likely") ,ordered = TRUE)
str(dat$apply)
## Ord.factor w/ 3 levels "unlikely"<"somewhat likely"<..: 3 2 1 2 2 1 2 2 1 2 ...
pared
as factor. However,
we change it to factor with labels “not attend” and “attend” for 0 and
1.## [1] "0" "1"
#change labels to "not attend" and "attend"
dat$pared <- factor(dat$pared, labels = c("not attend", "attend"))
#structure of pared
str(dat$pared)
## Factor w/ 2 levels "not attend","attend": 1 2 2 1 1 1 1 1 1 2 ...
clm
which stands
for Cumulative Link Model.#loading library ordinal
library(ordinal)
#running clm for cumulative link(logit) model
ordinal.clm <- clm(apply ~ pared , data = dat)
#extract the result
summary(ordinal.clm)
## formula: apply ~ pared
## data: dat
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 400 -361.40 728.79 5(0) 1.25e-10 9.3e+00
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## paredattend 1.1275 0.2634 4.28 1.87e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## unlikely|somewhat likely 0.3768 0.1103 3.415
## somewhat likely|very likely 2.4519 0.1826 13.430
In the output above, we see
formula, this is R reminding us what type of model we ran.
Next we see regression coefficient table including the value of each coefficient, standard errors, z value, and p-value.
Next we see the estimates for the two intercepts, which are sometimes called threshold coefficients. The intercepts indicate where the latent variable is cut to make the three groups that we observe in our data. Note that this latent variable is continuous. In general, these are not used in the interpretation of the results. The threshold coefficients are closely related to cutpoints, which are reported by other statistical packages.
The estimated model can be written in the form of equation (3.3):
\[\begin{eqnarray} \text{logit} (\hat{P}(Y \le 1)) & = & 0.3768 - 1.1275 \times (\text{pared = attended)} \\ \text{logit} (\hat{P}(Y \le 2)) & = & 2.4519 - 1.1275 \times (\text{pared = not attended}) \end{eqnarray}\]
## 2.5 % 97.5 %
## paredattend 0.6131224 1.647807
## 2.5 % 97.5 %
## unlikely|somewhat likely 0.1605758 0.593109
## somewhat likely|very likely 2.0940393 2.809673
## paredattend 0.6111730 1.643809
The coefficients from the model can be somewhat difficult to interpret because they are scaled in terms of log-odds. To interpret logistic regression models we convert the coefficients into odds ratios. To get the OR and confidence intervals, we just exponentiate the estimates and confidence intervals.
# OR and CI also the reciprocal of the odds ratio
exp(cbind(OR=-coef(ordinal.clm)[3], inv.OR = coef(ordinal.clm)[3], ci))
## OR inv.OR 2.5 % 97.5 %
## paredattend 0.3238448 3.087899 1.846187 5.195574
These coefficients are called proportional odds ratio and we would interpret these pretty much as we would odds ratio from a binary logistic regression.
Notice the betas are negative of coefficients. I included both Odds ratio and reciprocal odds ratio.
For students whose parents did attend college, the odds of being less likely (i.e., unlikely versus somewhat likely and very likely) to apply is \(0.3238448\) times that of students whose parents did not go to college.
Because the proportional odds assumption we can also say:
For students whose parents attend college, the odds of being more likely to apply (i.e., very likely versus unlikely or somewhat likely) is \(3.09\) times that of students whose parents did go to college.
polr()
from package
MASS
.#loading library MASS
library(MASS)
#running the model with polr from package MASS
m.plr <- polr(apply ~ pared, data = dat)
summary(m.plr)
## Call:
## polr(formula = apply ~ pared, data = dat)
##
## Coefficients:
## Value Std. Error t value
## paredattend 1.127 0.2634 4.28
##
## Intercepts:
## Value Std. Error t value
## unlikely|somewhat likely 0.3768 0.1103 3.4152
## somewhat likely|very likely 2.4519 0.1826 13.4302
##
## Residual Deviance: 722.7903
## AIC: 728.7903
The results are identical to result from clm()
. However,
package ordinal
has more options and also we can run a
multilevel ordinal model using function clmm2()
summary()
of a polr
object does not
give the p-value for the test statistics. We can approximate the p-value
for the test using normal approximation to t distribution.## Value Std. Error t value
## paredattend 1.1274909 0.2634324 4.280000
## unlikely|somewhat likely 0.3768429 0.1103421 3.415222
## somewhat likely|very likely 2.4518558 0.1825629 13.430200
# calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
# combined table
(ctable <- cbind(ctable, "p value" = p))
## Value Std. Error t value p value
## paredattend 1.1274909 0.2634324 4.280000 1.868929e-05
## unlikely|somewhat likely 0.3768429 0.1103421 3.415222 6.372997e-04
## somewhat likely|very likely 2.4518558 0.1825629 13.430200 4.023243e-41
One of the assumptions underlying ordinal logistic regression, the proportional odds (PO) assumption, is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption.
_ To run ordinal logistic regression with proportional odds
assumption relaxed we include option nominal = ~ pared
so
the model runs assuming the effect of predictors that are included in
option nominal is nominal.
Now use likelihood ratio test to test against more restricted model
that is the model with proportional adds assumption. For this, we use
function anova
.
#likelihood ratio test for parallel assumption
#ordinal logistic regression with relaxing proportional odds assumption
ordinal.clm.ur <- clm(apply ~ 1 ,nominal = ~pared , data = dat)
summary(ordinal.clm.ur)
## formula: apply ~ 1
## nominal: ~pared
## data: dat
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 400 -361.39 730.77 5(0) 7.67e-10 2.0e+01
##
## Threshold coefficients:
## Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept) 0.3783 0.1109 3.411
## somewhat likely|very likely.(Intercept) 2.4407 0.2007 12.164
## unlikely|somewhat likely.paredattend -1.1438 0.2925 -3.910
## somewhat likely|very likely.paredattend -1.0937 0.3704 -2.953
## Likelihood ratio tests of cumulative link models:
##
## formula: nominal: link: threshold:
## ordinal.clm apply ~ pared ~1 logit flexible
## ordinal.clm.ur apply ~ 1 ~pared logit flexible
##
## no.par AIC logLik LR.stat df Pr(>Chisq)
## ordinal.clm 3 728.79 -361.40
## ordinal.clm.ur 4 730.77 -361.39 0.0171 1 0.8961
Package brant
is another way to test the PO assumption,
using function brant()
to conduct the Brant test
of the proportional odds assumption. To use this function we need to run
our model using polr
.
## --------------------------------------------
## Test for X2 df probability
## --------------------------------------------
## Omnibus 0.02 1 0.9
## paredattend 0.02 1 0.9
## --------------------------------------------
##
## H0: Parallel Regression Assumption holds
apply
on
pared
, public
and gpa
Now run the ordinal logistic regression of apply
regressed on pared
, public
and
gpa
score.
First we make variable public as factor
## formula: apply ~ pared + public + gpa
## data: dat
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 400 -358.51 727.02 5(0) 1.63e-10 1.3e+03
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## paredattend 1.04766 0.26579 3.942 8.09e-05 ***
## publicpublic -0.05868 0.29786 -0.197 0.8438
## gpa 0.61575 0.26063 2.363 0.0182 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## unlikely|somewhat likely 2.2033 0.7795 2.826
## somewhat likely|very likely 4.2988 0.8043 5.345
The estimated model can be written in the form of equation (3.3):
\[\begin{eqnarray} logit (\hat{P}(Y \le 1)) & = & 2.20 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \\ logit (\hat{P}(Y \le 2)) & = & 4.30 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \end{eqnarray}\]
Exponentiate coefficients to get Odds ratios:
## 2.5 % 97.5 %
## paredattend 0.5281776 1.5721696
## publicpublic -0.6521908 0.5191405
## gpa 0.1076197 1.1309092
## OR inv.OR 2.5 % 97.5 %
## paredattend 0.3507563 2.8509825 1.6958390 4.817088
## publicpublic 1.0604388 0.9430059 0.5209033 1.680583
## gpa 0.5402378 1.8510366 1.1136241 3.098472
For students whose parents attend college, the odds of being more likely to apply (i.e., very likely versus unlikely or somewhat likely) is 2.85 times that of students whose parents did go to college, holding constant all other variables.
For students who are going to public school, the odds of unlikely apply (i.e., unlikely versus somewhat likely and very likely) is about \(6\%\) more than students going to private school, holding all other variables constant.
For one unit increase in GPA on average the odds of unlikely to apply (i.e., unlikely versus somewhat likely and very likely) is decreases by about \(46\%\), holding all other variables constant.
testing proportional assumption:
#re-run the model with proportional odds assumption relaxed
m.clm.relaxed <- clm(apply ~ 1, nominal = ~ pared + public + gpa , data = dat)
summary(m.clm.relaxed)
## formula: apply ~ 1
## nominal: ~pared + public + gpa
## data: dat
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 400 -356.51 729.01 5(0) 5.67e-08 2.7e+03
##
## Threshold coefficients:
## Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept) 2.1139 0.8039 2.630
## somewhat likely|very likely.(Intercept) 4.7213 1.3923 3.391
## unlikely|somewhat likely.paredattend -1.0831 0.2959 -3.660
## somewhat likely|very likely.paredattend -0.9947 0.3741 -2.659
## unlikely|somewhat likely.publicpublic 0.2307 0.3063 0.753
## somewhat likely|very likely.publicpublic -0.5367 0.4293 -1.250
## unlikely|somewhat likely.gpa -0.5921 0.2690 -2.201
## somewhat likely|very likely.gpa -0.7190 0.4537 -1.585
## Likelihood ratio tests of cumulative link models:
##
## formula: nominal: link:
## m.clm apply ~ pared + public + gpa ~1 logit
## m.clm.relaxed apply ~ 1 ~pared + public + gpa logit
## threshold:
## m.clm flexible
## m.clm.relaxed flexible
##
## no.par AIC logLik LR.stat df Pr(>Chisq)
## m.clm 5 727.02 -358.51
## m.clm.relaxed 8 729.01 -356.51 4.0138 3 0.26
#brant test for proportional odds assumption over polr object
#run the model using polr
m1.plr <- polr(apply ~ pared + public + gpa, data = dat)
#run the brant test
brant::brant(m1.plr)
## --------------------------------------------
## Test for X2 df probability
## --------------------------------------------
## Omnibus 4.34 3 0.23
## paredattend 0.13 1 0.72
## publicpublic 3.44 1 0.06
## gpa 0.18 1 0.67
## --------------------------------------------
##
## H0: Parallel Regression Assumption holds
Once we are done assessing whether the assumptions of our model hold, we can obtain predicted probabilities, which are usually easier to understand than either the coefficients or the odds ratios. For example, we can vary gpa for each level of pared and public and calculate the probability of being in each category of apply. We do this by creating a new dataset of all the values to use for prediction.
#creating new data
newdat <- data.frame(
pared = rep(levels(dat$pared), 200),
public = rep(levels(dat$public), each = 200),
gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4))
#using function predict
newdat <- cbind(newdat, predict(m.clm, newdat, type = "prob"))
head(newdat)
## pared public gpa fit.unlikely fit.somewhat likely fit.very likely
## 1 not attend private 1.900000 0.7375757 0.2204915 0.04193277
## 2 attend private 1.921212 0.4931707 0.3945956 0.11223368
## 3 not attend private 1.942424 0.7324882 0.2245169 0.04299488
## 4 attend private 1.963636 0.4866429 0.3984942 0.11486295
## 5 not attend private 1.984848 0.7273385 0.2285788 0.04408265
## 6 attend private 2.006061 0.4801195 0.4023348 0.11754565
To plot the predicted probabilities we need to reshape it to the long format
newdat.long <- newdat %>%
pivot_longer(cols = c(4,5,6), names_to = "apply", values_to = "Probability",
names_pattern ="....(.*)")
head(newdat.long)
## # A tibble: 6 × 5
## pared public gpa apply Probability
## <chr> <chr> <dbl> <chr> <dbl>
## 1 not attend private 1.9 unlikely 0.738
## 2 not attend private 1.9 somewhat likely 0.220
## 3 not attend private 1.9 very likely 0.0419
## 4 attend private 1.92 unlikely 0.493
## 5 attend private 1.92 somewhat likely 0.395
## 6 attend private 1.92 very likely 0.112
#plot of predicted probability vs. gpa separated by parent education and public
ggplot(newdat.long, aes(x = gpa, y = Probability, colour = apply)) +
geom_line() + facet_grid(pared ~ public, labeller="label_both")
The plot above shows the trend of predicted probability of each level of outcome against GPA split by parent education and school type.