Beyond Logistic Regression in R

Office of Advanced Research Computing (OARC), Statistical Methods and Data Analytics

1 Introduction

1.1 Regression

1.2 Categorical random variables

Blood type, gender, race, and zip code are examples of nominal categorical variables.

For example:

People can respond to a survey with strongly disagree, disagree, agree, and strongly agree. We cannot quantify how much bigger or smaller the distance between strongly agree and agree is compared to the distance between agree and disagree.

Socioeconomic status can be categorize as “low”, “middle”, and “high”. - A categorical variable with only two possible outcomes is a binary variable or Bernoulli variable.

For example:

In flipping a coin, the outcome must be head or tails.

A medical test that has only two possible outcome: positive or negative.

Smoking is a binary variable with outcome “Yes” or “No”.

1.2.1 Regression models of categorical variables

In binomial, multinomial, and ordinal logistic regression we use logistic functions as the link function. We will discuss logistic functions later.

1.3 Working with categorical variables in R: factor objects

We begin by creating a factor variable in R.

Let’s first create a character vector containing values “a”, “b”, “c”, and “d”.

x <- c("a", "b", "c", "d")
x
## [1] "a" "b" "c" "d"
class(x)
## [1] "character"

Function class() determines the class of an object.

We use function factor() to transfer x to a factor.

x.f <- factor(x, levels = c("a", "b", "c", "d"), labels = c("a", "b", "c", "d"))
x.f 
## [1] a b c d
## Levels: a b c d
class(x.f)
## [1] "factor"

By default, levels are the unique set of values taken by as.character(x), sorted in ascending order. For each level we can applylabels, which by default are the same as levels themselves. In the R-Studio Environment pane, we can observe the numerical values that represent each level.

The first level in a factor is the reference level. Using function relevel() we can change the reference.

relevel(x.f, ref = "b")
## [1] a b c d
## Levels: b a c d
x.f
## [1] a b c d
## Levels: a b c d

Because we did not assign the output of relevel() to an object, we did not permanently change the values of factor x.f. If we want to permanently change the reference level, we need to assign the result to a new object or the factor x.f itself.

x.f <- relevel(x.f, ref = "b")
x.f
## [1] a b c d
## Levels: b a c d

1.3.1 Optional

If we want to combine two groups into one group, we can just set two levels of an existing factor variable to be the same level inside levels(). Notice the order on the right hand side corresponds to the ordering of the levels of the original factor x.f:

# Map from 4 different values to three levels (combine c and d into c):
x.f2 <- x.f
levels(x.f2) <- c("b", "a", "c", "c")
x.f2
## [1] a b c c
## Levels: b a c
#or 
x.f2 <- factor(x.f, labels = c("b", "a", "c", "c"))
x.f2
## [1] a b c c
## Levels: b a c

We need to be cautious when we use the assignment operator <-. Above, not only did we change the levels of the factor, but also we have changed the values of vector to just 3 values. This is why we first stored a copy of the original vector.

1.3.2 Ordered factors

Now we are going to create an ordered factor, which stores the ordering of the levels. We can use function factor with option ordered = TRUE.

x.of <- factor(x, ordered = TRUE)
#or
ordered(x)
## [1] a b c d
## Levels: a < b < c < d

To change the order of ordered factor we can re-specify the levels in the order we want.

factor(x.of, levels = c("d","b","c", "a"), ordered = TRUE)
## [1] a b c d
## Levels: d < b < c < a

We can also change the labels of ordered factors.

(x.of <- ordered(x.of, 
      label = c("strong disagree", "disagree", "agree", "strongly agree")))
## [1] strong disagree disagree        agree           strongly agree 
## Levels: strong disagree < disagree < agree < strongly agree
factor(x.of,
      levels= c("strongly agree", "agree", "disagree", "strong disagree" ))
## [1] strong disagree disagree        agree           strongly agree 
## Levels: strongly agree < agree < disagree < strong disagree

The first line above labels “a” with “strongly disagree”, “b” with “disagree”, “c” with “agree”, and “d” with “strongly agree”.

The second line above changes the order of the levels, but the values remain the same. Since x.of is already an ordered factor we do not need to use argument order = TRUE.

1.4 hsb data

Let’s now read the first data set. The data contains academic performance and demographic information about 200 high school students. This data set is hypothetical and was created to demonstrate statistical analyses.

hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")

names(hsb)
##  [1] "id"      "female"  "ses"     "schtyp"  "prog"    "read"    "write"  
##  [8] "math"    "science" "socst"   "honors"  "awards"  "cid"
str(hsb)
## 'data.frame':    200 obs. of  13 variables:
##  $ id     : int  45 108 15 67 153 51 164 133 2 53 ...
##  $ female : chr  "female" "male" "male" "male" ...
##  $ ses    : chr  "low" "middle" "high" "low" ...
##  $ schtyp : chr  "public" "public" "public" "public" ...
##  $ prog   : chr  "vocation" "general" "vocation" "vocation" ...
##  $ read   : int  34 34 39 37 39 42 31 50 39 34 ...
##  $ write  : int  35 33 39 37 31 36 36 31 41 37 ...
##  $ math   : int  41 41 44 42 40 42 46 40 33 46 ...
##  $ science: int  29 36 26 33 39 31 39 34 42 39 ...
##  $ socst  : int  26 36 42 32 51 39 46 31 41 31 ...
##  $ honors : chr  "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
##  $ awards : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cid    : int  1 1 1 1 1 1 1 1 1 1 ...

Variables female, schtyp (school type), prog (program), honors (honors program), and ses (socioeconomic status) are categorical variables. Variable ses is an ordinal variable with levels “low”, “middle”, and “high”.

First we transform variable female to a factor and calculate frequencies and proportions of females and males.

#making variable female as a factor 
hsb$female <- factor(hsb$female)
#levels
levels(hsb$female)
## [1] "female" "male"
#frequency table
table(hsb$female)
## 
## female   male 
##    109     91
#proportion in each level
prop.table(table(hsb$female))
## 
## female   male 
##  0.545  0.455

Now we change the reference level form “female” to “male”

#changing the reference to "male" 
hsb$female <- relevel(hsb$female, ref = "male")
#check to see reference changed
levels(hsb$female)
## [1] "male"   "female"

1- In hsb data transform variables schtyp (school type), prog (program), honors (honors program), and ses (socioeconomic status) to unordered factor variables and report the proportions of each category for those variables.

#transform variables schtyp, prog, honors and ses to factor
hsb$schtyp <- factor(hsb$schtyp)
prop.table(table(hsb$schtyp))
## 
## private  public 
##    0.16    0.84
hsb$prog <- factor(hsb$prog)
prop.table(table(hsb$prog))
## 
## academic  general vocation 
##    0.525    0.225    0.250
hsb$honors <- factor(hsb$honors)
prop.table(table(hsb$honors))
## 
##     enrolled not enrolled 
##        0.265        0.735
#ses as factor in order or "low", "middle", "high". Not ordered factor.
hsb$ses <- factor(hsb$ses, levels = c("low", "middle", "high"))
prop.table(table(hsb$ses))
## 
##    low middle   high 
##  0.235  0.475  0.290

2- Change the reference of variable honors to “not enrolled”

hsb$honors <- relevel(hsb$honors, ref = "not enrolled")

Next, we review some important concepts in logistic regression.

1.5 Understanding odds and odds ratios

1.5.1 Odds vs. probability

The odds of an event with probability \(p\) is defined as: \[ \tt{odds(p)} = \dfrac{p}{1 -p} \]

\[ \tt{odds(enrolled)} = \dfrac{\Pr(enrolled)}{1 - \Pr(enrolled)} = \dfrac{0.3}{1 - 0.3} = 0.4285714 \]

\[ \tt{odds(not \ enrolled)} = \dfrac{\Pr(not \ enrolled)}{1 - \Pr(not \ enrolled)} = \dfrac{0.7}{1 - 0.7} = 2.333333 \]

\[ p = \dfrac{\tt{odds}}{1 + \tt{odds}} = \dfrac{2}{1 + 2} = \dfrac{2}{3} \approx 0.67 \]

For categorical variables with multiple categories we generalize the definition of odds to be the relative probability, or odds, of one level vs. the reference level (sometimes called relative risk). For example in the program variable (prog), which has 3 levels, “academic”, “general”, “vocational”, we can define the odds of being in the general program vs. the academic program.

\[ \tt{odds(general \ vs. \ academic)} = \dfrac{\Pr(general)}{\Pr(academic)} \]

#first we create frequency table and 
#then we use prop.table to calculate probabilities
prog.tab <- table(hsb$prog)
(prog.p <- prop.table(prog.tab))
## 
## academic  general vocation 
##    0.525    0.225    0.250
#odds of general vs. academic
(odds.general <- 0.225  / 0.525) 
## [1] 0.4285714
#odds of vocational vs. academic
(odds.vocational <- 0.250  / 0.525)
## [1] 0.4761905

1.5.2 Calculation probability from odds in multinomial (optional)

Since,

\[\begin{eqnarray} \dfrac{1 - \Pr(\tt{academic}) }{\Pr(\tt{academic})} &=& \dfrac{\Pr(\tt{general}) + \Pr(vocational)}{\Pr(\tt{academic})} \\ &=& \tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)} \end{eqnarray}\]

Thus,

\[\begin{eqnarray} \Pr(\tt{academic}) &=& \dfrac{1}{1 +\tt{odds(general \ vs. \ academic)} + \tt{odds(vocational \ vs. \ academic)}}, \\ \Pr(\tt{general}) &=& \Pr(\tt{academic}) \times \tt{odds(general \ vs. \ academic)}, \tt{and} \\ \Pr(\tt{vocational}) &=& \Pr(\tt{academic}) \times \tt{odds(vocational \ vs. \ academic)}.\\ \end{eqnarray}\]

1.5.3 Odds ratios (binary variable)

For example the odds ratio of enrollment in the honors program for females to males is \[\text{OddsRatio(enrolled for female to male)} =\dfrac{\tt{odds(enrolled \ | \ female)}}{\tt{odds(enrolled \ | \ male)}}\].

#calculating proportion of enrollment for each group of female and male. 
#table gives frequencies 
(tab_honors_female <- table(hsb$honors, hsb$female))
##               
##                male female
##   not enrolled   73     74
##   enrolled       18     35
#prop.table on frequency table with margin = 2 returns column-vise proportions.
prop.table(tab_honors_female, margin = 2)
##               
##                     male    female
##   not enrolled 0.8021978 0.6788991
##   enrolled     0.1978022 0.3211009
(odds.ratio <- (0.3211009 / 0.6788991) / (0.1978022 / 0.8021978) )
## [1] 1.918168

Odds of enrollment in the honors program for a female student is almost twice the odds for a male student. Or, the odds of enrollment for female students is about 92% more than male students.

We could also use the frequencies directly to calculate odds and odds ratios since the total number of females and males will cancel in the numerator and denominator.

(35/74) / (18/73)
## [1] 1.918168

-Note: Calculating probability from odd-ratio is only possible if we know odds of at least one group. Otherwise, it is not possible to calculate probabilities from the odds ratio.

1.5.4 Odds Ratios (multinomial variable)

For example the odds ratio comparing females to males for the odds of “general” vs. “academic” (the reference category) is:

\[ \text{OddsRatio(general vs. academic for female to male)} = \dfrac{\left[\dfrac{\Pr (\text{general | female})}{\Pr(\text{academic | female})}\right]}{\left[\dfrac{\Pr (\text{general | male})}{\Pr(\text{academic | male})}\right]}\]

#3 X 2 table of program vs. female
table(hsb$prog, hsb$female)
##           
##            male female
##   academic   47     58
##   general    21     24
##   vocation   23     27
#odds ratios using frequencies
(odds.ratio <- (24/58) / (21/47))
## [1] 0.9261084

We can say that the odds of general program vs. academic program for female student is \(0.926\) times less than the odds of general program vs. academic program for male students.

Or, we can say the odds of general program vs. academic program for female student is \(7.4\%\) less than the odds of general program vs. academic program for male students.

1.7 Binary logistic model

\[\begin{equation} y | x_1, x_2,\cdots, x_p = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p + \epsilon \tt{\ \ (1.1)} \end{equation}\]

or

\[\mu_{y | x_1, x_2,\cdots, x_p} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p \tt{\ \ \ \ \ (1.2)}\]

\(\mu_y\) is the mean of dependent variable for specific values of independent variables. \(\epsilon\) is the unexplained error term or uncertainty of the model that has mean equal to 0.

\[ \log(odds | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p \tt{\ \ \ \ \ (1.3)}\]

or,

\[\frac{Pr(y = 1| x_1, x_2 , \cdots x_p)}{Pr(y = 0| x_1, x_2 , \cdots x_p)} = \exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + ... + \beta_p x_p)} \tt{\ \ \ \ \ (1.4)}\]

1.7.1 Calculating predicted probability using odds (Optional)

solving for \(\Pr(y = 1)\) given that \(\Pr(y = 0) = 1 - \Pr(y = 1)\) we have,

\[Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}}{1+\exp{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}} \tt{\ \ \ \ \ (1.5)}\]

or,

\[Pr(y = 1| x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp[{-(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)}]} \tt{\ \ \ \ \ (1.6)}\] and

\[Pr(y = 0 | x_1, x_2 , \cdots x_p) = \frac{1} {1 + \exp\left[(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \cdots + \beta_p x_p)\right]} .\]

1.8 Logistic regression in R

The syntax of glm is as follows:

glm(formula, family = binomial(link = "logit"), data = data, weights, subset, na.action, ...)

The first argument is the model formula, followed by defining distribution of the outcome.

1.8.1 Logistic regression example: honors predicted by math and female

For this analysis we use data set hsb to model enrollment to honor program as the outcome and variables math and female as independent variables. Our model assumes the effect of math score is the same for females and males (i.e., no interactions between math and female).

#to estimate coefficient of the model we use glm with  
#the option family = binomial(link = "logit")
logit.m <- glm(honors ~ math + female, family = binomial(link = "logit"), data = hsb)
summary(logit.m)
## 
## Call:
## glm(formula = honors ~ math + female, family = binomial(link = "logit"), 
##     data = hsb)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0188  -0.6676  -0.3110   0.4119   2.5408  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -11.79228    1.71892  -6.860 6.87e-12 ***
## math           0.18254    0.02842   6.423 1.34e-10 ***
## femalefemale   1.12085    0.42403   2.643  0.00821 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 231.29  on 199  degrees of freedom
## Residual deviance: 158.46  on 197  degrees of freedom
## AIC: 164.46
## 
## Number of Fisher Scoring iterations: 5
coef(summary(logit.m))
##                Estimate Std. Error   z value     Pr(>|z|)
## (Intercept)  -11.792281 1.71891936 -6.860288 6.872196e-12
## math           0.182538 0.02842085  6.422678 1.338976e-10
## femalefemale   1.120847 0.42403214  2.643308 8.210041e-03
confint(logit.m)
##                    2.5 %    97.5 %
## (Intercept)  -15.4710305 -8.689254
## math           0.1309423  0.243122
## femalefemale   0.3143782  1.986565

\[ \log(\text{odds of enrolled}) = \log\left[\dfrac{\Pr (\text{enrolled})}{\Pr(\text{not enrolled})}\right]= -11.792281 + 0.182538 \text{ math} + 1.120847 \text{ female}\]

1.8.2 Interpreting coefficients

For one unit increase of math score the log-odds of enrolled is expected to increase by 0.182538, holding other predictors (female) constant.

The log-odds of enrolled for a female student is 1.120847 greater than the log-odds of enrolled for a male student on average.

If we exponentiate the coefficients with exp(), then we can interpret coefficients in the odds ratio metric.

#exponentiate coefficients
exp(coef(logit.m))
##  (Intercept)         math femalefemale 
## 7.562706e-06 1.200260e+00 3.067452e+00

We can also use exp() with confint() to calculate the confidence intervals of the odds ratio.

#exponentiate coefficients and column bind with exponentiation of CIs

cbind(OR = exp(coef(logit.m)), exp(confint(logit.m)))
##                        OR        2.5 %       97.5 %
## (Intercept)  7.562706e-06 1.909927e-07 0.0001683856
## math         1.200260e+00 1.139902e+00 1.2752241647
## femalefemale 3.067452e+00 1.369407e+00 7.2904468215

The above confidence interval (CI) is based on the profile likelihood method. If you want to calculate CIs based on standard errors you can use confint.default instead.

2 Multinomial logistic regression

2.1 Multinomial logistic model

Suppose we want to model a regression with a categorical outcome with \(J\) number of categories and the \(J\)th level is the reference category.

For each category \(j\) from \(1, 2, \cdots, J-1\) the log-odds of category \(j\) versus the reference category \(J\) given predictors \(x_1, x_2, \cdots, x_p\) is modeled as the form below:

\[ \log(\text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p)) =\log \left[ \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} \right] = \beta_{0j} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p \tt{\ \ \ \ \ (2.1)}\]

where \(odds_j\) is “generalized” odds and \[Pr(y = J) = 1 - \sum_{j=1}^{J-1} Pr(y = j).\]

We can exponentiate both sides of equation (2.1):

\[ \text{odds}(Y = j \ | \ x_1, x_2, \cdots, x_p) = \dfrac{\Pr(y = j|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \exp(\beta_{j0} + \beta_{1j} x_1 + \beta_{2j} x_2 + \cdots + \beta_{pj} x_p)\tt{\ \ \ \ \ \ (2.2)}\] - In multinomial logistic regression, for an outcome with 3 levels we need to estimate \((p + 1) * 2\) coefficients and for an outcome with 4 levels we estimate \((p + 1) * 3\) coefficients.

FAQ What is complete or quasi-complete separation in logistic/probit regression and how do we deal with them?

How Relevant is the Independence of Irrelevant Alternatives?

2.1.1 Calculating predicted probability using odds (Optional)

We can now calculate the probability of the reference level using the odds by summing both sides over all \(j\) from \(1, 2, \cdots, J-1\),

\[ \dfrac{1 - \Pr(y = J|x_1, x_2, \cdots, x_p)}{\Pr(y = J|x_1, x_2, \cdots, x_p)} = \sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) \]

Solving for \(Pr(y = J)\) we get:

\[ \Pr(y = J|x_1, x_2, \cdots, x_p) = \dfrac{1}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.3)} \]

To get the probability of the other \(j\) levels from \(1, 2, \cdots, J-1\) we just need to multiply the odds of each level by probability of the reference level thus,

\[ \Pr(y = j|x_1, x_2, \cdots, x_p) = \dfrac{\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p)}{1 +\sum_{j=1}^{J-1}\exp(\beta_{j0} + \beta_{j1} x_1 + \beta_{j2} x_2 + \cdots + \beta_{jp} x_p) } \tt{\ \ \ \ \ \ (2.4)} \]

2.2 R example of multinomial logistic regression

The outcome variable prog has 3 levels, “academic”, “general”, “vocation” with the reference group “academic”(the \(J\)th category).

For help documentation about the multinom() function, see ?multinom. For this workshop we only use the basic arguments of the multinom() function.

#summary of the outcome prog
(tab.prog <- table(hsb$prog))
## 
## academic  general vocation 
##      105       45       50
prop.table(tab.prog)
## 
## academic  general vocation 
##    0.525    0.225    0.250
#running multinomial regression model for prog on socio econonic status and write
library(nnet)
m.multinom <- multinom(prog ~ ses + write, data = hsb)
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.982880
## final  value 179.981726 
## converged
#extracting the result of model using summary
(m.results <- summary(m.multinom, digits = 3))
## Call:
## multinom(formula = prog ~ ses + write, data = hsb)
## 
## Coefficients:
##          (Intercept) sesmiddle seshigh   write
## general         2.85    -0.533  -1.163 -0.0579
## vocation        5.22     0.291  -0.983 -0.1136
## 
## Std. Errors:
##          (Intercept) sesmiddle seshigh  write
## general         1.17     0.444   0.514 0.0214
## vocation        1.16     0.476   0.596 0.0222
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635

The first output line is the model formula.

2.3 Interpreting the coefficients of multinomial logistic

If we substitute the model results into equation 2.1 we have:

\[\begin{eqnarray} \log\left[\dfrac{\Pr(\tt{general})}{\Pr(\tt{academic})}\right] = 2.85 -0.533 \times\text{(ses = middle)} -1.163\times\text{(ses = high)} -0.0579\times \tt{write}, \\ \\ \log\left[\dfrac{\Pr(\tt{vocation})}{\Pr(\tt{academic})}\right] = 5.22 + 0.291 \times\text{(ses = middle)} - 0.983 \times\text{(ses = high)} - 0.1136 \times \tt{write}. \\ \end{eqnarray}\]

Intercepts:

For independent variable write:

For ses:

exp(coef(m.multinom))
##          (Intercept) sesmiddle   seshigh     write
## general     17.32582 0.5866769 0.3126026 0.9437172
## vocation   184.61262 1.3382809 0.3743123 0.8926116

2.4 Test statistics in Multinomial logistic regression

#calculating z score by deviding estimated coefficient by it's standard error
z <- m.results$coefficients/m.results$standard.errors
#using pnorm we get probability of tail of normal distribution for z value
p <- (1 - pnorm(abs(z), 0, 1)) * 2
#print the result of z-scores and p-values
print(z,digits = 4)
##          (Intercept) sesmiddle seshigh  write
## general        2.445   -1.2018  -2.261 -2.706
## vocation       4.485    0.6117  -1.650 -5.113
print(p, digits = 4)
##          (Intercept) sesmiddle seshigh     write
## general    1.448e-02    0.2294 0.02374 6.819e-03
## vocation   7.299e-06    0.5408 0.09895 3.176e-07
#update the model to exclude ses
m.multinom.r <- update(m.multinom, . ~ . -ses)
## # weights:  9 (4 variable)
## initial  value 219.722458 
## final  value 185.510837 
## converged
#anova of the restricted model vs. the full model
anova(m.multinom.r, m.multinom)
## Likelihood ratio tests of Multinomial Models
## 
## Response: prog
##         Model Resid. df Resid. Dev   Test    Df LR stat.    Pr(Chi)
## 1       write       396   371.0217                                 
## 2 ses + write       392   359.9635 1 vs 2     4 11.05822 0.02591741

2.5 Ploting predicted value in multinomial logistic regression

#fitted gives fitted probability for observations, head report the first 6 rows
head(pp <- fitted(m.multinom))
##    academic   general  vocation
## 1 0.1482764 0.3382454 0.5134781
## 2 0.1202017 0.1806283 0.6991700
## 3 0.4186747 0.2368082 0.3445171
## 4 0.1726885 0.3508384 0.4764731
## 5 0.1001231 0.1689374 0.7309395
## 6 0.3533566 0.2377976 0.4088458

It will be more helpful to examine the changes in predicted probability associated with one of our two variables. We can create small datasets varying one variable while holding the other constant. We will first do this by holding write at its mean and examining the predicted probabilities for each level of ses.

#new data with all levels of ses and write at the mean of write.
newdata.ses <- data.frame(ses = c("low", "middle", "high"), write = mean(hsb$write))
#column bind the new data with predicted probabilities 
cbind(newdata.ses, probability = predict(m.multinom, newdata = newdata.ses, "probs"))
##      ses  write probability.academic probability.general probability.vocation
## 1    low 52.775            0.4396845           0.3581917            0.2021238
## 2 middle 52.775            0.4777488           0.2283353            0.2939159
## 3   high 52.775            0.7009007           0.1784939            0.1206054

Another way to understand the model using the predicted probabilities is to look at the averaged predicted probabilities for different values of the continuous predictor variable write, for example from 30 to 70 within each level of ses. (It is suggested to check the range of continuous variable first)

#check the range of write
range(hsb$write)
## [1] 31 67
#in new data,  for write from 30 to 40 we replicate levels of ses 
newdata.write <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),
    3))
#predict gives predicted probability for each rows of newdata
pp.write <- cbind(newdata.write, predict(m.multinom, newdata = newdata.write, type = "probs", se = TRUE))
#first few rows
head(pp.write)
##   ses write   academic   general  vocation
## 1 low    30 0.09843588 0.2999880 0.6015762
## 2 low    31 0.10716868 0.3082195 0.5846118
## 3 low    32 0.11650390 0.3162093 0.5672868
## 4 low    33 0.12645834 0.3239094 0.5496323
## 5 low    34 0.13704576 0.3312711 0.5316831
## 6 low    35 0.14827643 0.3382454 0.5134781
#calculate the mean probabilities within each level of ses
by(pp.write[, 3:5], pp.write$ses, colMeans)
## pp.write$ses: high
##  academic   general  vocation 
## 0.6164315 0.1808037 0.2027648 
## ------------------------------------------------------------ 
## pp.write$ses: low
##  academic   general  vocation 
## 0.3972977 0.3278174 0.2748849 
## ------------------------------------------------------------ 
## pp.write$ses: middle
##  academic   general  vocation 
## 0.4256198 0.2010864 0.3732938

To plot the predicted probability, first we need to make the predicted probability data set in a long form. We do this because we want to create factor program with 3 levels and all probabilities in one column.

#pivot_longer from package tidy reshape the wide format to long format
library(tidyr)
long.pp <- pp.write %>% pivot_longer(cols = c("academic", "general", "vocation"), names_to = "Program", values_to = "Probability")
#fist few rows will be
head(long.pp)
## # A tibble: 6 × 4
##   ses   write Program  Probability
##   <chr> <int> <chr>          <dbl>
## 1 low      30 academic      0.0984
## 2 low      30 general       0.300 
## 3 low      30 vocation      0.602 
## 4 low      31 academic      0.107 
## 5 low      31 general       0.308 
## 6 low      31 vocation      0.585

Now we are ready to plot the predicted probability as function of write, separatly by ses and Program.

library(ggplot2)
ggplot(long.pp, aes(x = write, y = Probability, colour = ses)) + 
  geom_line(lwd=1.2) + 
  facet_grid(Program ~., scales = "free")

2.6 Package mlogit

library(mlogit)
#reshapes the data in data frame with index, dfidx 
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "prog")
class(hsb.choice)
## [1] "dfidx_mlogit" "dfidx"        "data.frame"   "mlogit.data"
#each row extended to 3 row for alternative index
head(data.frame(hsb.choice))
##    id female    ses schtyp  prog read write math science socst       honors
## 1  45 female    low public FALSE   34    35   41      29    26 not enrolled
## 2  45 female    low public FALSE   34    35   41      29    26 not enrolled
## 3  45 female    low public  TRUE   34    35   41      29    26 not enrolled
## 4 108   male middle public FALSE   34    33   41      36    36 not enrolled
## 5 108   male middle public  TRUE   34    33   41      36    36 not enrolled
## 6 108   male middle public FALSE   34    33   41      36    36 not enrolled
##   awards cid chid      alt    idx
## 1      0   1    1 academic 1:emic
## 2      0   1    1  general 1:eral
## 3      0   1    1 vocation 1:tion
## 4      0   1    2 academic 2:emic
## 5      0   1    2  general 2:eral
## 6      0   1    2 vocation 2:tion
#running multinomial logistic using mlogit
m.mlogit <- mlogit(prog ~ 0 | ses + write , data = hsb.choice)
#extract the result using summary
summary(m.mlogit)
## 
## Call:
## mlogit(formula = prog ~ 0 | ses + write, data = hsb.choice, method = "nr")
## 
## Frequencies of alternatives:choice
## academic  general vocation 
##    0.525    0.225    0.250 
## 
## nr method
## 4 iterations, 0h:0m:0s 
## g'(-H)^-1g = 7.89E-07 
## gradient close to zero 
## 
## Coefficients :
##                       Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):general   2.852186   1.166439  2.4452  0.014477 *  
## (Intercept):vocation  5.218200   1.163549  4.4847 7.301e-06 ***
## sesmiddle:general    -0.533291   0.443732 -1.2018  0.229429    
## sesmiddle:vocation    0.291393   0.476374  0.6117  0.540743    
## seshigh:general      -1.162832   0.514219 -2.2614  0.023737 *  
## seshigh:vocation     -0.982670   0.595567 -1.6500  0.098948 .  
## write:general        -0.057928   0.021411 -2.7056  0.006819 ** 
## write:vocation       -0.113603   0.022220 -5.1127 3.177e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -179.98
## McFadden R^2:  0.11815 
## Likelihood ratio test : chisq = 48.23 (p.value = 1.063e-08)

2.7 Exercise for multinomial logistic regression:

In hsb data First center math score to its mean and call it c.math. Then, run a multinomial logistic regression with the ses variable as a nominal outcome and use variable c.math and female as predictors, and with the interaction between c.math and female.

Solution:

#first create variable c.math
hsb$c.math <- scale(hsb$math, center = TRUE, scale = FALSE)
#run the moltinomial regression with nnet::multinom()
ex.multinom <- multinom(ses ~ c.math + female + c.math:female, data = hsb)
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 199.706497
## final  value 198.527428 
## converged
(result <- summary(ex.multinom))
## Call:
## multinom(formula = ses ~ c.math + female + c.math:female, data = hsb)
## 
## Coefficients:
##        (Intercept)     c.math femalefemale c.math:femalefemale
## middle   1.3152562 0.07581224   -0.8768092        -0.060864602
## high     0.8092806 0.09105872   -1.0052843         0.007573346
## 
## Std. Errors:
##        (Intercept)     c.math femalefemale c.math:femalefemale
## middle   0.3398332 0.03711554    0.4143891           0.0459392
## high     0.3609246 0.03932942    0.4582664           0.0506627
## 
## Residual Deviance: 397.0549 
## AIC: 413.0549
#using mlogit first we need to reshape the data
hsb.choice <- mlogit.data(hsb, shape = "wide", choice = "ses")
m.mlogit <- mlogit(ses ~ 0 | c.math * female, data = hsb.choice)
#extract the result
summary(m.mlogit)
## 
## Call:
## mlogit(formula = ses ~ 0 | c.math * female, data = hsb.choice, 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##   high    low middle 
##  0.290  0.235  0.475 
## 
## nr method
## 5 iterations, 0h:0m:0s 
## g'(-H)^-1g = 0.00023 
## successive function values within tolerance limits 
## 
## Coefficients :
##                      Estimate Std. Error z-value Pr(>|z|)  
## (Intercept):low     -0.809263   0.360924 -2.2422  0.02495 *
## (Intercept):middle   0.505971   0.240321  2.1054  0.03526 *
## c.math:low          -0.091058   0.039329 -2.3153  0.02060 *
## c.math:middle       -0.015247   0.024738 -0.6163  0.53768  
## femalefemale:low     1.005263   0.458266  2.1936  0.02826 *
## femalefemale:middle  0.128477   0.352987  0.3640  0.71588  
## c.math:low          -0.007574   0.050663 -0.1495  0.88116  
## c.math:middle       -0.068437   0.037904 -1.8055  0.07099 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -198.53
## McFadden R^2:  0.057246 
## Likelihood ratio test : chisq = 24.11 (p.value = 0.00049843)

3 Ordinal logistic Regression

3.1 Ordinal logistic model

Let \(y\) be an ordinal outcome variable with \(J\) categories, ordered from 1 to J. We are defining \(\Pr(y \leq j)\) to be the cumulative probability of \(y\) less than or equal to a specific category \(j\) for \(j = 1, 2, \cdots J - 1\) and thus, \(\Pr(y \leq J) = 1\).

The odds of outcome being less than equal to a particular category will be:

\[odds(y \leq j) = \dfrac{\Pr(y \leq j)}{\Pr(y > j)} \tt{ \ \ \ \ \ \ (3.1)}\] is defined for \(j = 1, 2, \cdots J - 1\). Since \(\Pr(y \gt J) = 0\) the odds is undefined for the \(J\)th category.

The ordinal logistic regression model can be defined as

\[\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1j}x_1 + \cdots + \beta_{pj} x_p\tt{ \ \ \ \ \ \ (3.2)}\]

With the parallel lines assumption (proportional odds assumption), the intercepts are different for each category but the slopes are assumed to be constant across categories, which simplifies the equation above to

\[\text{logit} (\Pr(y \le j|x_1, x_2, \cdots, x_p)) =\log\left(\dfrac{\Pr(y \leq j|x_1, x_2, \cdots, x_p)}{\Pr(y > j|x_1, x_2, \cdots, x_p)}\right)= \beta_{0j} + \beta_{1}x_1 + \cdots + \beta_{p} x_p \tt{ \ \ \ \ \ \ (3.3)}\]

clm(formula, scale, nominal, data, weights, start, subset, doFit = TRUE, na.action, contrasts, model = TRUE, control=list(), link = c("logit", "probit", "cloglog", "loglog", "cauchit", "Aranda-Ordaz", "log-gamma"), threshold = c("flexible", "symmetric", "symmetric2", "equidistant"), ...)

\[logit (P(y \le j)) = \beta_{j0} - \eta_{1}x_1 - \cdots - \eta_{p} x_p\] and will return \(\eta_i = -\beta_i\).

3.2 R example for ordinal logistic regression

#reading the data into R
dat <- read.csv("https://stats.idre.ucla.edu/stat/data/ologit.csv")
#structure of the data
str(dat)
## 'data.frame':    400 obs. of  4 variables:
##  $ apply : int  2 1 0 1 1 0 1 1 0 1 ...
##  $ pared : int  0 1 1 0 0 0 0 0 0 1 ...
##  $ public: int  0 0 1 0 0 1 0 0 0 0 ...
##  $ gpa   : num  3.26 3.21 3.94 2.81 2.53 ...
#transform apply to ordered factor
dat$apply <- factor(dat$apply, labels = c("unlikely", "somewhat likely", "very likely") ,ordered = TRUE)
str(dat$apply)
##  Ord.factor w/ 3 levels "unlikely"<"somewhat likely"<..: 3 2 1 2 2 1 2 2 1 2 ...
#transform pared to factor
dat$pared <- factor(dat$pared)
#checking levels
levels(dat$pared)
## [1] "0" "1"
#change labels to "not attend" and "attend"
dat$pared <- factor(dat$pared, labels = c("not attend", "attend"))
#structure of pared
str(dat$pared)
##  Factor w/ 2 levels "not attend","attend": 1 2 2 1 1 1 1 1 1 2 ...
#loading library ordinal
library(ordinal)
#running clm for cumulative link(logit) model
ordinal.clm <- clm(apply ~ pared , data = dat)
#extract the result
summary(ordinal.clm)
## formula: apply ~ pared
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -361.40 728.79 5(0)  1.25e-10 9.3e+00
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## paredattend   1.1275     0.2634    4.28 1.87e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##                             Estimate Std. Error z value
## unlikely|somewhat likely      0.3768     0.1103   3.415
## somewhat likely|very likely   2.4519     0.1826  13.430

In the output above, we see

The estimated model can be written in the form of equation (3.3):

\[\begin{eqnarray} \text{logit} (\hat{P}(Y \le 1)) & = & 0.3768 - 1.1275 \times (\text{pared = attended)} \\ \text{logit} (\hat{P}(Y \le 2)) & = & 2.4519 - 1.1275 \times (\text{pared = not attended}) \end{eqnarray}\]

(ci <- confint(ordinal.clm)) # default method gives profiled CIs
##                 2.5 %   97.5 %
## paredattend 0.6131224 1.647807
confint.default(ordinal.clm) # CIs assuming normality
##                                 2.5 %   97.5 %
## unlikely|somewhat likely    0.1605758 0.593109
## somewhat likely|very likely 2.0940393 2.809673
## paredattend                 0.6111730 1.643809

3.3 Interpreting the result of ordinal logistic regression

The coefficients from the model can be somewhat difficult to interpret because they are scaled in terms of log-odds. To interpret logistic regression models we convert the coefficients into odds ratios. To get the OR and confidence intervals, we just exponentiate the estimates and confidence intervals.

# OR and CI also the reciprocal of the odds ratio
exp(cbind(OR=-coef(ordinal.clm)[3], inv.OR = coef(ordinal.clm)[3], ci))
##                    OR   inv.OR    2.5 %   97.5 %
## paredattend 0.3238448 3.087899 1.846187 5.195574

For students whose parents attend college, the odds of being more likely to apply (i.e., very likely versus unlikely or somewhat likely) is \(3.09\) times that of students whose parents did go to college.

3.4 Using polr in package MASS

#loading library MASS
library(MASS)
#running the model with polr from package MASS
m.plr <- polr(apply ~ pared, data = dat)
summary(m.plr)
## Call:
## polr(formula = apply ~ pared, data = dat)
## 
## Coefficients:
##             Value Std. Error t value
## paredattend 1.127     0.2634    4.28
## 
## Intercepts:
##                             Value   Std. Error t value
## unlikely|somewhat likely     0.3768  0.1103     3.4152
## somewhat likely|very likely  2.4519  0.1826    13.4302
## 
## Residual Deviance: 722.7903 
## AIC: 728.7903

The results are identical to result from clm(). However, package ordinal has more options and also we can run a multilevel ordinal model using function clmm2()

# store table
(ctable <- coef(summary(m.plr)))
##                                 Value Std. Error   t value
## paredattend                 1.1274909  0.2634324  4.280000
## unlikely|somewhat likely    0.3768429  0.1103421  3.415222
## somewhat likely|very likely 2.4518558  0.1825629 13.430200
# calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2

# combined table
(ctable <- cbind(ctable, "p value" = p))
##                                 Value Std. Error   t value      p value
## paredattend                 1.1274909  0.2634324  4.280000 1.868929e-05
## unlikely|somewhat likely    0.3768429  0.1103421  3.415222 6.372997e-04
## somewhat likely|very likely 2.4518558  0.1825629 13.430200 4.023243e-41

3.5 Testing propotional odds assumption

One of the assumptions underlying ordinal logistic regression, the proportional odds (PO) assumption, is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption.

_ To run ordinal logistic regression with proportional odds assumption relaxed we include option nominal = ~ pared so the model runs assuming the effect of predictors that are included in option nominal is nominal.

Now use likelihood ratio test to test against more restricted model that is the model with proportional adds assumption. For this, we use function anova.

#likelihood ratio test for parallel assumption

#ordinal logistic regression with relaxing proportional odds assumption
ordinal.clm.ur <- clm(apply ~ 1 ,nominal = ~pared , data = dat)
summary(ordinal.clm.ur)
## formula: apply ~ 1
## nominal: ~pared
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -361.39 730.77 5(0)  7.67e-10 2.0e+01
## 
## Threshold coefficients:
##                                         Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept)      0.3783     0.1109   3.411
## somewhat likely|very likely.(Intercept)   2.4407     0.2007  12.164
## unlikely|somewhat likely.paredattend     -1.1438     0.2925  -3.910
## somewhat likely|very likely.paredattend  -1.0937     0.3704  -2.953
anova(ordinal.clm.ur, ordinal.clm)
## Likelihood ratio tests of cumulative link models:
##  
##                formula:      nominal: link: threshold:
## ordinal.clm    apply ~ pared ~1       logit flexible  
## ordinal.clm.ur apply ~ 1     ~pared   logit flexible  
## 
##                no.par    AIC  logLik LR.stat df Pr(>Chisq)
## ordinal.clm         3 728.79 -361.40                      
## ordinal.clm.ur      4 730.77 -361.39  0.0171  1     0.8961

Package brant is another way to test the PO assumption, using function brant() to conduct the Brant test of the proportional odds assumption. To use this function we need to run our model using polr.

#brant test for proportional odds assumption over polr object
library(brant)
brant::brant(m.plr)
## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      0.02    1   0.9
## paredattend  0.02    1   0.9
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

3.6 Exercise, Ordinal logistic regression of apply on pared, public and gpa

Now run the ordinal logistic regression of apply regressed on pared, public and gpa score.

First we make variable public as factor

dat$public <- factor(dat$public, labels = c("private", "public"))
#run the clm model
m.clm <- clm(apply ~ pared + public + gpa , data = dat)
summary(m.clm)
## formula: apply ~ pared + public + gpa
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -358.51 727.02 5(0)  1.63e-10 1.3e+03
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## paredattend   1.04766    0.26579   3.942 8.09e-05 ***
## publicpublic -0.05868    0.29786  -0.197   0.8438    
## gpa           0.61575    0.26063   2.363   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##                             Estimate Std. Error z value
## unlikely|somewhat likely      2.2033     0.7795   2.826
## somewhat likely|very likely   4.2988     0.8043   5.345

The estimated model can be written in the form of equation (3.3):

\[\begin{eqnarray} logit (\hat{P}(Y \le 1)) & = & 2.20 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \\ logit (\hat{P}(Y \le 2)) & = & 4.30 - 1.05*PARED - (-0.06)*PUBLIC - 0.616*GPA \end{eqnarray}\]

Exponentiate coefficients to get Odds ratios:

#getting CI
(ci <- confint(m.clm))
##                   2.5 %    97.5 %
## paredattend   0.5281776 1.5721696
## publicpublic -0.6521908 0.5191405
## gpa           0.1076197 1.1309092
# exponantiate to get OR and CI
exp(cbind(OR=-coef(m.clm)[3:5], inv.OR = coef(m.clm)[3:5], ci))
##                     OR    inv.OR     2.5 %   97.5 %
## paredattend  0.3507563 2.8509825 1.6958390 4.817088
## publicpublic 1.0604388 0.9430059 0.5209033 1.680583
## gpa          0.5402378 1.8510366 1.1136241 3.098472

testing proportional assumption:

#re-run the model with proportional odds assumption relaxed

m.clm.relaxed <- clm(apply ~ 1, nominal = ~ pared + public + gpa , data = dat)
summary(m.clm.relaxed)
## formula: apply ~ 1
## nominal: ~pared + public + gpa
## data:    dat
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  400  -356.51 729.01 5(0)  5.67e-08 2.7e+03
## 
## Threshold coefficients:
##                                          Estimate Std. Error z value
## unlikely|somewhat likely.(Intercept)       2.1139     0.8039   2.630
## somewhat likely|very likely.(Intercept)    4.7213     1.3923   3.391
## unlikely|somewhat likely.paredattend      -1.0831     0.2959  -3.660
## somewhat likely|very likely.paredattend   -0.9947     0.3741  -2.659
## unlikely|somewhat likely.publicpublic      0.2307     0.3063   0.753
## somewhat likely|very likely.publicpublic  -0.5367     0.4293  -1.250
## unlikely|somewhat likely.gpa              -0.5921     0.2690  -2.201
## somewhat likely|very likely.gpa           -0.7190     0.4537  -1.585
#likelihood ratio test with anova
anova(m.clm, m.clm.relaxed)
## Likelihood ratio tests of cumulative link models:
##  
##               formula:                     nominal:              link:
## m.clm         apply ~ pared + public + gpa ~1                    logit
## m.clm.relaxed apply ~ 1                    ~pared + public + gpa logit
##               threshold:
## m.clm         flexible  
## m.clm.relaxed flexible  
## 
##               no.par    AIC  logLik LR.stat df Pr(>Chisq)
## m.clm              5 727.02 -358.51                      
## m.clm.relaxed      8 729.01 -356.51  4.0138  3       0.26
#brant test for proportional odds assumption over polr object
#run the model using polr
m1.plr <- polr(apply ~ pared + public + gpa, data = dat)
#run the brant test
brant::brant(m1.plr)
## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      4.34    3   0.23
## paredattend  0.13    1   0.72
## publicpublic 3.44    1   0.06
## gpa      0.18    1   0.67
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds

3.7 Ploting predicted probabiliy for ordinal logistic regression

Once we are done assessing whether the assumptions of our model hold, we can obtain predicted probabilities, which are usually easier to understand than either the coefficients or the odds ratios. For example, we can vary gpa for each level of pared and public and calculate the probability of being in each category of apply. We do this by creating a new dataset of all the values to use for prediction.

#creating new data
newdat <- data.frame(
  pared = rep(levels(dat$pared), 200),
  public = rep(levels(dat$public), each = 200),
  gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4))
#using function predict
newdat <- cbind(newdat, predict(m.clm, newdat, type = "prob"))

head(newdat)
##        pared  public      gpa fit.unlikely fit.somewhat likely fit.very likely
## 1 not attend private 1.900000    0.7375757           0.2204915      0.04193277
## 2     attend private 1.921212    0.4931707           0.3945956      0.11223368
## 3 not attend private 1.942424    0.7324882           0.2245169      0.04299488
## 4     attend private 1.963636    0.4866429           0.3984942      0.11486295
## 5 not attend private 1.984848    0.7273385           0.2285788      0.04408265
## 6     attend private 2.006061    0.4801195           0.4023348      0.11754565

To plot the predicted probabilities we need to reshape it to the long format

newdat.long <- newdat %>% 
  pivot_longer(cols = c(4,5,6), names_to = "apply", values_to = "Probability",
               names_pattern ="....(.*)")

head(newdat.long)
## # A tibble: 6 × 5
##   pared      public    gpa apply           Probability
##   <chr>      <chr>   <dbl> <chr>                 <dbl>
## 1 not attend private  1.9  unlikely             0.738 
## 2 not attend private  1.9  somewhat likely      0.220 
## 3 not attend private  1.9  very likely          0.0419
## 4 attend     private  1.92 unlikely             0.493 
## 5 attend     private  1.92 somewhat likely      0.395 
## 6 attend     private  1.92 very likely          0.112
#plot of predicted probability vs. gpa separated by parent education and public
ggplot(newdat.long, aes(x = gpa, y = Probability, colour = apply)) +
  geom_line() + facet_grid(pared ~ public, labeller="label_both")

The plot above shows the trend of predicted probability of each level of outcome against GPA split by parent education and school type.