This workshop introduces the logistic regression model and discusses how to run the model and associated methods in Stata.
In this seminar, we will cover:
Binary variables have 2 levels, typically 0 (false/failure) and 1 (true/success).
Below are 8 observations of binary variable \(y\):
\[y=(0,1,1,1,1,0,1,1)\]
The mean of a 0/1 variable is the proportion of ones:
\[\bar{y}=\frac{\sum_iy}{n}=.75\]
The mean is also an estimate of the probability of ones in the population, \(p=Pr(y=1)\):
\[\bar{y}=\hat{p}=.75\]
The \(\hat{}\) symbol denotes a sample estimate of a population parameter.
With \(n\) observations of a binary variable, we can observe between 0 and \(n\) ones (successes).
Over repeated samples of size \(n\), the number of ones we observe in each sample is determined by the probability of success, \(p\).
For samples of size \(n\) with probability of success \(p\), the binomial distribution describes the probability of observing each number of successes, from 0 to \(n\).
The binomial distribution has 2 parameters:
A histogram of the number of successes (ones) in 1000 samples of size \(n=8\) from a population with \(p=.25\) approximates the binomial distribution for \(n=8\), \(p=.25\):
When the \(p\) is low, most samples contain more zeroes than ones.
When \(p\) is higher, say \(p=.75\), most samples contain more ones than zeroes.
Let \(y^*\) be the number of successes in a sample of size \(n\) of binary variable \(y\).
The exact probability of observing \(k\) successes in samples of size \(n\) where the probability of success is \(p\) is given by:
\[Pr(y^*=k|n,p) = {n\choose k}p^k(1-p)^{(n-k)}\]
For samples of binomially-distributed \(y\) of size \(n\) with probability of success \(p\), the mean (expected) number of successes is \(np\):
\[E(y|n,p) = np\]
For samples with \(n=8\) and probability \(p=.75\), the mean is \(6\):
\[E(y|n=8, p=.75) = (8)(.75) = 6\]
The variance of \(y\) is:
\[Var(y) = np(1-p)\]
Both mean and variance of \(y\) vary with \(p\), so the binomial distribution is heteroskedastic.
In most data sets, each row will be a single observation of a binary variable (y below):
y | x |
---|---|
1 | 4 |
1 | 4 |
0 | 5 |
1 | 5 |
0 | 7 |
Above, each row is a single observation of a binomial variable y, so \(n=1\)† and \(y\) will be either 0 or 1.
When \(n=1\), the mean of binomially-distributed \(y\) is \(p\) itself:
\[E(y|n=1,p) = np = (1)p = p\]
Regression models mean outcomes, so we are modeling \(p\) with \(n=1\).
Stata’s logit
command, discussed extensively later,
expects data with \(n=1\) per row.
Aggregated data
Regression assumes that observations with the same predictor values (x above) come from the same population with a common \(p=Pr(y=1)\), so we could aggregate the data like so:
n | y | x |
---|---|---|
2 | 2 | 4 |
2 | 1 | 5 |
1 | 0 | 7 |
Now \(y\) is the count of ones and \(n\) is the sample size.
In either style of data, \(n\) is known so only \(p\) will be modeled.
The 2 datasets above will produce the same logistic regression
estimates, but aggregated data must be analyzed in blogit
in Stata rather than logit
.
†The binomial distribution with \(n=1\) is also known as the Bernoulli distribution.
Regression models estimate the relationship between the mean outcome and a set of predictors.
As seen above, for binary outcomes the mean is \(p\) itself.
We could model the relationships between predictors and \(p\) in a linear regression, but some assumptions will be violated:
Logistic regression addresses each issue above.
Regression models linear relationships between the mean outcome and the predictors, which can be problematic for bounded outcomes:
Transforming the outcome can extend its range.
In logistic regression, two transformations on the mean outcome (probability) change its bounds from \([0,1]\) to \((-\infty, \infty)\).
We first transform probabilities to odds:
\[odds(p) = \frac{p}{1-p}\]
Odds lie within \([0,\infty)\) and have a positive, monotonic relationship with probability \(p\).
Odds are interpreted as the expected number of successes per failure.
For example, if \(p=.75\), then \(odds(p)=\frac{.75}{.25}=3\), which means we expect 3 successes for every failure.
The next transformation extends the left bound to \(-\infty\).
If variable \(x\) is defined over \((0,\infty)\), then \(log(x)\) is defined over \((-\infty, \infty)\).
The log of the odds, or logit, is thus defined over \((-\infty, \infty)\).
\[logit(p) = log(odds(p)) = log(\frac{p}{1-p})\]
The logit has an S-shaped (sigmoidal) relationship with the probability:
The logit function is often referred to as a link function, which transforms the mean outcome to allow a linear relationship with the predictors.
The logistic regression model asserts a linear relationship between the logit (log-odds) of the outcome variable and the predictors \(x_1\) through \(x_k\) for observation \(i\):
\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_{i1} + ... + \beta_kx_{ik}\]
The linear predictor, \(\beta_0 + \beta_1x_{i1}+...+\beta_kx_{ik}\), expresses the expected outcome in logits (log odds).
The inverse logit function transforms a logit back into a probability:
\[logit^{-1}(x) = \frac{exp(x)}{1+exp(x)}\]
\[logit^{-1}(logit(p_i)) = p_i\]
Thus, the inverse logit of the linear predictor expresses the expected outcome as a probability:
\[p_i = \frac{exp(\beta_0 + \beta_1x_{i1}+...+\beta_kx_{ik})}{1+exp(\beta_0 + \beta_1x_{i1}+...+\beta_kx_{ik})}\]
In the above expression, we see that \(p_i\) has a non-linear relationship with the predictors.
Because of this nonlinear relationship, the amount of change in \(p\) due to a one-unit increase in \(x\) depends on the value of \(x\).
Logistic regression coefficients can be interpreted in at least 2 ways.
The first interpretation expresses additive changes in the logits, or log odds of the outcome.
Imagine a simple logistic regression model with a single predictor \(x\):
\[logit(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_i\]
Setting \(x=0\), the expected outcome is \(\beta_0\):
\[\begin{align} logit(\frac{p_{x=0}}{1-p_{x=0}}) &= \beta_0 + \beta_1(0) \\ &= \beta_0 \end{align}\]
If we set \(x=1\), the expected outcome is \(\beta_0 + \beta_1\):
\[\begin{align} log(\frac{p_{x=1}}{1-p_{x=1}}) &= \beta_0 + \beta_1(1) \\ &= \beta_0 + \beta_1 \end{align}\]
Taking the difference between these expectations gives us an interpretation of \(\beta_1\):
\[\begin{align} log(\frac{p_{x=1}}{1-p_{x=1}}) - log(\frac{p_{x=0}}{1-p_{x=0}}) &= (\beta_0 + \beta_1) - \beta_0 \\ &= \beta_1 \end{align}\]
Thus, \(\beta_1\) is the expected change in the logit (log-odds) of the outcome for a one-unit increase in the predictor \(x\).
Interpreting coefficients as change in logits is simple because the relationships are linear, but most audiences will struggle with understanding magnitudes in logits.
A more common way to interpret logistic regression coefficients is to exponentiate them to express odds ratios.
Recall that the exponential function is the inverse of the (natural) logarithm function:
\[exp(log(x)) = x\] Also recall that the difference of two logs is the log of the ratio:
\[log(a) - log(b) = log(\frac{a}{b})\]
Logistic regression coefficients denote the change in the log of the odds of the outcome for a unit increase in the predictor, which we can express as the log of the ratio of the odds:
\[\begin{align} \beta_1 &= log(\frac{p_{x=1}}{1-p_{x=1}}) - log(\frac{p_{x=0}}{1-p_{x=0}}) \\ &= log(odds_{x=1}) - log(odds_{x=0}) \\ &= log(\frac{odds_{x=1}}{odds_{x=0}}) \\ \end{align}\]
Now, if we exponentiate both sides, we see how the exponentiated coefficient is an odds ratio:
\[\begin{align} exp(\beta_1) &= exp(log(\frac{odds_{x=1}}{odds_{x=0}})) \\ &= \frac{odds_{x=1}}{odds_{x=0}} \\ &= \frac{\frac{p_{x=1}}{1-p_{x=1}}}{\frac{p_{x=0}}{1-p_{x=0}}} \\ \end{align}\]
Odds ratios express the expected multiplicative change in the odds of the outcome for a unit increase in the predictor.
We use a data set of birth weights of 180 infants to demonstrate logistic regression in Stata, where infants weighing less than 2500g are classified as low birth weight (lbw).
Outcome
Predictors
Logistic regression will estimate linear relationships of each predictor above with the log odds of lbw.
The low birth weight data set comes with Stata and can be loaded with the command:
We will explore the variables with histograms
logit
commandStat’s logit
command performs logistic regression using
the following syntax:
logit depvar indepvarlist
where depvar
is the dependent variable
(outcome) and indepvarlist
is a list of one or
more independent variables (predictors).
depvar
is generally coded 0/1, and Stata will
model the probability that depvar
=1i.
in
indepvarlist
to request that Stata enter dummy
variables representing each category but the first (by default) into the
regression
i.
as
factorsor
after a ,
after
indepvarlist
to produce odds ratios; otherwise
coefficients are in the logit metric by defaultStata’s logistic
command estimates the same model as
logit
but can only output odds ratios.
lbw
dataWe will estimate the relationships of mother’s age, previous premature labors, history of hypertension, smoking, and race with the log odds of having a low birth weight (lbw) baby.
age
and
ptl
i.
precedes each of ht
, smoke
and race
to request that dummy variables for each category
but the first be entered into the model/* logistic regression with low as outcome,
age and ptl as continuous predictors,
ht, smoke and race as categorical predictors */
logit low age ptl i.ht i.smoke i.race
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.286
Iteration 2: Log likelihood = -104.90511
Iteration 3: Log likelihood = -104.90441
Iteration 4: Log likelihood = -104.90441
Logistic regression Number of obs = 189
LR chi2(6) = 24.86
Prob > chi2 = 0.0004
Log likelihood = -104.90441 Pseudo R2 = 0.1059
------------------------------------------------------------------------------
low | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0469041 .0353279 -1.33 0.184 -.1161454 .0223372
ptl | .7501393 .3298036 2.27 0.023 .1037362 1.396542
1.ht | 1.227762 .6246189 1.97 0.049 .0035312 2.451992
|
smoke |
Smoker | .9728681 .3863386 2.52 0.012 .2156584 1.730078
|
race |
Black | .973844 .5013314 1.94 0.052 -.0087474 1.956435
Other | 1.021987 .4191365 2.44 0.015 .2004942 1.843479
|
_cons | -.9237957 .891843 -1.04 0.300 -2.671776 .8241845
------------------------------------------------------------------------------
Starting towards the top of the output (after the likelihood iterations we see):
_cons
)
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.286
Iteration 2: Log likelihood = -104.90511
Iteration 3: Log likelihood = -104.90441
Iteration 4: Log likelihood = -104.90441
Logistic regression Number of obs = 189
LR chi2(6) = 24.86
Prob > chi2 = 0.0004
Log likelihood = -104.90441 Pseudo R2 = 0.1059
------------------------------------------------------------------------------
low | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0469041 .0353279 -1.33 0.184 -.1161454 .0223372
ptl | .7501393 .3298036 2.27 0.023 .1037362 1.396542
1.ht | 1.227762 .6246189 1.97 0.049 .0035312 2.451992
|
smoke |
Smoker | .9728681 .3863386 2.52 0.012 .2156584 1.730078
|
race |
Black | .973844 .5013314 1.94 0.052 -.0087474 1.956435
Other | 1.021987 .4191365 2.44 0.015 .2004942 1.843479
|
_cons | -.9237957 .891843 -1.04 0.300 -2.671776 .8241845
------------------------------------------------------------------------------
The estimated coefficients above are interpreted as change in logits (log odds of being lbw).
Continuous predictors
Categorical predictors
Odds ratios are interpreted in a couple ways (let \(OR\) be some odds ratio):
An odds ratio above one denotes an increase in the odds, below one denotes a decrease in the odds, and equal to one denotes no change in the odds.
We repeat our logistic regression of low
, now with odds
ratios by adding the or
option:
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.286
Iteration 2: Log likelihood = -104.90511
Iteration 3: Log likelihood = -104.90441
Iteration 4: Log likelihood = -104.90441
Logistic regression Number of obs = 189
LR chi2(6) = 24.86
Prob > chi2 = 0.0004
Log likelihood = -104.90441 Pseudo R2 = 0.1059
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9541789 .0337091 -1.33 0.184 .8903458 1.022589
ptl | 2.117295 .6982914 2.27 0.023 1.109308 4.041203
1.ht | 3.413581 2.132187 1.97 0.049 1.003537 11.61146
|
smoke |
Smoker | 2.645521 1.022067 2.52 0.012 1.240679 5.641093
|
race |
Black | 2.648104 1.327578 1.94 0.052 .9912907 7.074066
Other | 2.778709 1.164658 2.44 0.015 1.222007 6.318482
|
_cons | .3970093 .3540699 -1.04 0.300 .0691294 2.280021
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Note: test (z) statistics and p-values are the same whether regression coefficients or odds ratios are reported
Continuous predictors
Categorical predictors
Interactions in regression models allow the effect of one predictor to vary with another predictor.
For example, we might believe that the effect of age is different for smoking vs nonsmoking mothers.
Interaction terms are generated by taking the product of the component variables.
Stata provides the #
and ##
operator for
interactions, where the former requests only the interaction and the
latter requests both the lower order terms and the interaction.
Below, we add the interaction of smoke
and
age
to the model:
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.18462
Iteration 2: Log likelihood = -104.70994
Iteration 3: Log likelihood = -104.7077
Iteration 4: Log likelihood = -104.7077
Logistic regression Number of obs = 189
LR chi2(7) = 25.26
Prob > chi2 = 0.0007
Log likelihood = -104.7077 Pseudo R2 = 0.1076
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9336296 .0471014 -1.36 0.173 .84573 1.030665
|
smoke |
Smoker | .9512806 1.593868 -0.03 0.976 .0356552 25.38017
|
smoke#c.age |
Smoker | 1.045964 .0751283 0.63 0.532 .9086104 1.204082
|
ptl | 2.10686 .694485 2.26 0.024 1.104215 4.019921
1.ht | 3.469858 2.170839 1.99 0.047 1.018067 11.82625
|
race |
Black | 2.48018 1.269646 1.77 0.076 .9093663 6.764375
Other | 2.704271 1.141077 2.36 0.018 1.182722 6.183265
|
_cons | .660881 .8026954 -0.34 0.733 .0611319 7.144616
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
In the logit metric, interaction coefficients are interpreted as additive changes in a predictor’s effects due to a one-unit increase in the interacting predictor.
Exponentiated interaction coefficients are not odds ratios, but instead are the ratio of one odds ratio vs another.
The exponentiated interaction coefficient above, 1.046, can be interpreted as:
smoke
in the table above)To practice logistic regression in Stata, we will use the
nhanes2
dataset, coming from the National Health and
Nutrition Examination Survey, that can be loaded with:
The model variables are:
highbp
: high blood pressure, 1=high blood pressure,
0=normalregion
: region of US, 1=NE, 2=MW, 3=S, 4=Wrural
: rural residence, 0=urban, 1= ruralweight
: weight in kgage
: age in yearshighbp
with
categorical predictors (factors) region
and
rural
, and linear relationships for continuous predictors
weight
and age
, with coefficients expressed in
logits (log odds). Interpret the coefficients.Understanding the magnitude of predictor effects as changes in logits or as odds ratios can be difficult.
Expressing model effects in probabilities can make magnitudes clearer.
In this section we show how Stata’s predict
and
margins
commands predict model-based probabilities of the
outcome and how to use these to interpret model results.
Because of the nonlinear relationship between the predictors and the probability of the outcome:
At times, it can also be useful to predict outcomes in logits, for example, to assess assumptions of linear relationships.
predict
after logistic regressionStata’s predict
command is primarily used to make
model-based predictions for the current data.
The basic syntax is:
predict newvar
where newvar
is the name of new variable that
will hold the predictions.
By default, after logit
, predict
will
estimate predicted probabilities, but many other types of quantities can
be predicted with various options (see
logit postestimation
).
After running predict
, newvar
will
be added to the data set with the predictions.
(option pr assumed; Pr(low))
Here are the model variables and predicted probabilities from the model for the first 10 observations:
/* display model variables and predictions for first 10 obs */
list low age ptl ht smoke race predp in 1/10
| low age ptl ht smoke race predp |
|-----------------------------------------------------|
1. | 0 19 0 0 Nonsmoker Black .3012971 |
2. | 0 33 0 0 Nonsmoker Other .1900565 |
3. | 0 20 0 0 Smoker White .2913144 |
4. | 0 21 0 0 Smoker White .2817266 |
5. | 0 18 0 0 Smoker White .311053 |
|-----------------------------------------------------|
6. | 0 21 0 0 Nonsmoker Other .2917718 |
7. | 0 22 0 0 Nonsmoker White .1239348 |
8. | 0 17 0 0 Nonsmoker Other .3319944 |
9. | 0 29 0 0 Smoker White .2122952 |
10. | 0 26 0 0 Smoker White .2367767 |
+-----------------------------------------------------+
A model with high predictive power will predict probabilities near
zero for observations where low
=0, and near one for
observations where low
=1.
Let’s examine the distribution of predicted probabilities by observed
outcome low
:
/* compare distribution of predp between observations with low=1 vs low=0 */
graph box predp, over(low) b1title("observed low") ytitle("predicted Pr(low=1)")
The distributions are not very separated, indicating that the model has low predictive power.
margins
and marginal estimatesStata’s margins
command is one its most powerful and
versatile commands, which estimates many kinds of model predictions,
including:
Because of its versatility, margins
has an extensive
syntax that can be difficult to master (the manual entry for
margins
is 78 pages long!).
Nevertheless, margins
functionality is one of the
primary reasons to use Stata.
margins
can only be used after a model has been
estimated.
Predictive margins are predicted outcomes averaged over the distribution of some (or all) predictors and at fixed levels of the others.
Generally, when specifying margins
, whichever predictors
are not fixed at specific levels will be averaged over when predicting
the outcome.
Thus, simply specifying margins
after our logistic
regression of lbw will result in a predicted probability of lbw,
averaging over the distributions of all predictors in the model:
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_cons | .3121693 .0314715 9.92 0.000 .2504864 .3738523
------------------------------------------------------------------------------
Note: z-statistics and p-values assessing null hypotheses testing predictive margins against zero are generally ignored (we generally are not interested in testing whether a predicted probability is different from zero).
The above predictive margin:
predp
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
low | 189 .3121693 .4646093 0 1
predp | 189 .3121693 .1663442 .0458923 .7944289
We can also estimate predictive margins for specific subpopulations.
The syntax:
margins factorvar
estimates predictive margins for each level of
factorvar
(predictor preceded by i.
),
averaging over the distribution of the other predictors.
Thus, the following margins
estimates predicted
probabilities of lbw for smoking and non-smoking mothers, averaging over
the distribution of all other predictors:
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
smoke |
Nonsmoker | .2400186 .0387952 6.19 0.000 .1639814 .3160559
Smoker | .4283874 .0580954 7.37 0.000 .3145226 .5422522
------------------------------------------------------------------------------
The predictive margin for Nonsmoker
above can be
estimated manually with the following procedure:
smoke=0
By leaving the other predictors at their observed values when predicting and then averaging the predictions, the resulting mean is averaged over the distribution of those predictors.
The predictive margin for Smoker
is obtained using the
same procedure except everyone is converted to smoke=1
in
the first step.
Use #
to obtain predictive margins at each combination
of the factors’ levels:
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
smoke#race |
Nonsmoker #|
White | .1551203 .0467924 3.32 0.001 .0634088 .2468317
Nonsmoker #|
Black | .314714 .0899992 3.50 0.000 .1383188 .4911092
Nonsmoker #|
Other | .3245289 .056837 5.71 0.000 .2131304 .4359273
Smoker #|
White | .3145167 .0568889 5.53 0.000 .2030165 .426017
Smoker #|
Black | .5340904 .1121103 4.76 0.000 .3143583 .7538226
Smoker #|
Other | .5454064 .0971575 5.61 0.000 .3549812 .7358315
------------------------------------------------------------------------------
The predictive margins above are the predicted probabilities of lbw for each combination of smoking status and race, averaging over age, premature labors, and history of hypertension.
margins
with
at()
We can fix the levels of any predictor in margins
,
including continuous predictors, with the at()
option:
at()
after a ,
at()
=
and then the
value at which to fix the predictor=
, put
the values inside ()
using one of the following notations
a/b
requests every integer from a
to
b
a(c)b
requests all values from a
to
b
in increments of c
Remember that any predictor not specified in margins
will be averaged over when predicting outcomes.
First, we estimate the predictive margin for mothers who are age 20:
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
At: age = 20
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_cons | .3365013 .0372488 9.03 0.000 .2634951 .4095075
------------------------------------------------------------------------------
Next are predictive margins for smoking and non-smoking mothers who are aged 20 or 30:
/* average probabilities of lbw for smoking and non-smoking moms age 20 or 30 */
margins smoke, at(age=(20 30))
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
1._at: age = 20
2._at: age = 30
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_at#smoke |
1#Nonsmoker | .2612321 .0444322 5.88 0.000 .1741465 .3483176
1#Smoker | .4590947 .0625156 7.34 0.000 .3365665 .581623
2#Nonsmoker | .187449 .0493688 3.80 0.000 .0906879 .28421
2#Smoker | .3577227 .0789483 4.53 0.000 .2029869 .5124585
------------------------------------------------------------------------------
And average predicted probabilities for mothers aged between 20, 25, 30, 35 or 40 and who have had 0, 1, or 2 premature labors:
/* average probabilities of lbw for mothers aged 20 to 40 (in increments of 5) and
either 0 to 2 premature labors */
margins, at(age=(20(5)40) ptl=(0/2))
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
1._at: age = 20
ptl = 0
2._at: age = 20
ptl = 1
3._at: age = 20
ptl = 2
4._at: age = 25
ptl = 0
5._at: age = 25
ptl = 1
6._at: age = 25
ptl = 2
7._at: age = 30
ptl = 0
8._at: age = 30
ptl = 1
9._at: age = 30
ptl = 2
10._at: age = 35
ptl = 0
11._at: age = 35
ptl = 1
12._at: age = 35
ptl = 2
13._at: age = 40
ptl = 0
14._at: age = 40
ptl = 1
15._at: age = 40
ptl = 2
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | .303371 .0387435 7.83 0.000 .227435 .3793069
2 | .4632617 .0746991 6.20 0.000 .3168543 .6096692
3 | .6311228 .135043 4.67 0.000 .3664434 .8958023
4 | .2598657 .0371378 7.00 0.000 .1870771 .3326544
5 | .410806 .0679355 6.05 0.000 .2776549 .543957
6 | .5799304 .1349761 4.30 0.000 .3153822 .8444786
7 | .2204237 .052976 4.16 0.000 .1165927 .3242547
8 | .3602354 .079823 4.51 0.000 .2037853 .5166855
9 | .527131 .1434264 3.68 0.000 .2460204 .8082417
10 | .1852631 .0689992 2.69 0.007 .0500271 .3204991
11 | .3124608 .0995509 3.14 0.002 .1173446 .5075769
12 | .473814 .1592928 2.97 0.003 .1616059 .7860222
13 | .1544039 .0806383 1.91 0.056 -.0036444 .3124521
14 | .2681937 .1185453 2.26 0.024 .0358491 .5005383
15 | .4211062 .1789255 2.35 0.019 .0704186 .7717937
------------------------------------------------------------------------------
Above we see that Stata uses a shorthand _at
variable to
label the estimates with a key above the margins
table.
margins
estimates with
marginsplot
Another compelling reason to use margins
is
marginsplot
, which graphs the estimates of the last
margins
command.
A plot of predicted outcomes across a range of predictor values can be helpful to interpret the predictor’s effects.
marginsplot
will plot the predicted outcome at the
predictor values specified in the previous margins
and will
generally connect them with a line.
Let’s examine a plot of the predictive margins for smoking and non-smoking mothers:
/* estimate marginal means for smoking and non-smoking mothers */
margins smoke
/* plot marginal means */
marginsplot
If more than one predictor is specified in margins
,
marginsplot
will place one predictor on the x-axis and will
plot separate lines by the other predictors.
/* estimate marginal means for smoking and non-smoking mothers at ages 20, 30 and 40 */
margins smoke, at(age=(20 30 40))
* plot marginal means
marginsplot
Use the x()
(shorthand for xdimension()
)
option to specify which variable appears on the x-axis:
/* estimate marginal means for smoking and non-smoking mothers at ages 20, 30 and 40 */
margins smoke, at(age=(20 30 40))
/* plot marginal means */
marginsplot, x(smoke)
If many predictors are specified in margins
, the
subsequent marginsplot
may be cluttered with too many
lines:
/* margins with many predictors fixed at many levels */
margins smoke#race, at(age=(20 30 40) ptl=(0 1))
/* too many lines in marginsplot */
marginsplot
Specify some of the predictors in a by()
option to plot
separate graphs by those variables:
If all model predictors are specified in margins
, Stata
calls the resulting estimated outcomes adjusted predictions.
Adjusted predictions are predictions for a specific type of person rather than some average.
/* adjusted predictions of probability of lbw for smokers and non-smokers of each race
who are age 30 with no previous premature labors and a history of hypertension */
margins smoke#race, at(age=30 ptl=0 ht=1)
Adjusted predictions Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
At: age = 30
ptl = 0
ht = 1
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
smoke#race |
Nonsmoker #|
White | .249149 .1345422 1.85 0.064 -.0145488 .5128469
Nonsmoker #|
Black | .467717 .1883502 2.48 0.013 .0985574 .8368766
Nonsmoker #|
Other | .4797188 .1743215 2.75 0.006 .138055 .8213826
Smoker #|
White | .467474 .1718665 2.72 0.007 .1306219 .8043262
Smoker #|
Black | .6992137 .1651199 4.23 0.000 .3755847 1.022843
Smoker #|
Other | .7092406 .159545 4.45 0.000 .3965383 1.021943
------------------------------------------------------------------------------
Notice how Stata labels the estimates adjusted predictions.
Adjusted predictions are often estimated at the means of the
predictors, so Stata provides an atmeans
option to fix all
predictors at their means that have not already been fixed at other
values.
/* adjusted predictions of probability of lbw for smokers and non-smokers of each race
who have a history of hypertension and are at the mean age and number of premature labors */
margins smoke#race, at(ht=1) atmeans
Adjusted predictions Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
At: age = 23.2381 (mean)
ptl = .1957672 (mean)
ht = 1
0.smoke = .6084656 (mean)
1.smoke = .3915344 (mean)
1.race = .5079365 (mean)
2.race = .1375661 (mean)
3.race = .3544974 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
smoke#race |
Nonsmoker #|
White | .3454404 .153753 2.25 0.025 .04409 .6467909
Nonsmoker #|
Black | .5829029 .1710685 3.41 0.001 .2476148 .918191
Nonsmoker #|
Other | .5945589 .1554467 3.82 0.000 .2898889 .8992289
Smoker #|
White | .5826656 .1562253 3.73 0.000 .2764697 .8888615
Smoker #|
Black | .7871062 .1220021 6.45 0.000 .5479866 1.026226
Smoker #|
Other | .795062 .117221 6.78 0.000 .565313 1.024811
------------------------------------------------------------------------------
Note: Except in linear regression models, an outcome predicted at the mean of the predictors will generally not equal the predicted outcome averaged over the distribution of the predictors
In linear regression, a regression coefficient can be directly interpreted as the change in the outcome in its original metric for a one-unit increase in the predictor.
In models where the predictors are linearly related to the transformed outcome via a link function, as in logistic regression, such a coefficient interpretation is not possible, because:
Marginal effects express predictor effects in the original metric of the outcome, with the other predictors fixed at specific values or averaged over.
In logistic regression, a marginal effect expresses a change in probabilities.
margins
estimates marginal effects with the
dydx()
option.
Marginal effects of categorical predictors are straightforward to interpret: a difference in probabilities (risk difference), at fixed values of other predictors or averaging over them.
For example, the following estimates the difference in probabilities of lbw between mothers with and without a history of hypertension, averaging over the distribution of all other predictors:
Average marginal effects Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
dy/dx wrt: 1.ht
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
1.ht | .2579502 .1344364 1.92 0.055 -.0055403 .5214407
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
On average, mothers with a history of hypertension are expected to have 0.258 higher probability of lbw than mothers with no history.
For most audiences, the above risk difference is easier to understand than odds ratios or changes in logits.
The above risk difference is equal to the difference between the
predictive margins for ht
:
Predictive margins Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ht |
0 | .2941321 .0324756 9.06 0.000 .2304811 .3577831
1 | .5520823 .1304096 4.23 0.000 .2964842 .8076804
------------------------------------------------------------------------------
Instead of averaging over race, we can estimate the marginal effect of hypertension separately by race:
Average marginal effects Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
dy/dx wrt: 1.ht
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
0.ht | (base outcome)
-------------+----------------------------------------------------------------
1.ht |
race |
White | .2334744 .1364055 1.71 0.087 -.0338755 .5008243
Black | .270582 .1302459 2.08 0.038 .0153047 .5258592
Other | .2702543 .1287184 2.10 0.036 .0179708 .5225377
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Above we see that the magnitude of the marginal effect of hypertension depends on the value of race.
Note: In models where the outcome has a nonlinear relationship with the predictors, the fact that the marginal effect of one predictor varies across levels of another predictor does not necessarily imply that an interaction term was entered in the model. See VanderWeele and Knol (2014) for a thorough explanation.
Marginal effects of continuous predictors are more difficult to interpret.
The marginal effect of a continuous predictor is the partial derivative of the outcome with respect to that predictor, at fixed values of other predictors or averaging over them.
In the figure below that displays the curved relationship between predictor \(x\) and \(Pr(y=1)\), the relationship is close to linear at \(x=0\), but not so linear at \(x=2\).
The estimated marginal effect of \(x\) will approximate the change in \(Pr(y=1)\) for a one-unit increase in \(x\) if most observations have \(x\) values near 0 but not if most have observations near 2.
Below, we estimate the marginal effect of age, averaging over the other predictors:
Average marginal effects Number of obs = 189
Model VCE: OIM
Expression: Pr(low), predict()
dy/dx wrt: age
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0087802 .0065292 -1.34 0.179 -.0215772 .0040167
------------------------------------------------------------------------------
For the age range of mothers in this data set, 14-45, the relationship between age and probability of lbw is close to linear:
/* predictive margins for ages 14-45 */
margins, at(age=(14/45))
/* plot margins without confidence intervals */
marginsplot, noci
Thus, the marginal effect of age is approximately the expected change in the probability of lbw for each additional year of age.
We have now seen that the predictor effects in logistic regression models can be expressed as odds ratios or marginal effects, so which to report?
Odds ratios
Bad
Marginal effects
Good
Bad
If space allows, reporting both types of effect estimates might be preferable.
Generally, it is advisable to report some sort of baseline probability of the outcome so that the audience has an idea of the magnitudes of effects, especially if only odds ratios are reported.
The journal Health Services Research has encouraged the reporting of marginal effects over odds ratios (Norton et al., 2024), so consult the target journal’s instructions for authors for guidance.
Use the NHANES data and the previously run logistic regression to:
margins, at(region=4 urban=0 weight=60 age=30)
and
interpret the estimateStata provides two \(\chi^2\)-based
tests of the fit of the data to the model with the command
estat gof
.
Both assess the null hypothesis that the model-predicted frequency of ones match the observed frequency of ones.
Rejection of the null hypothesis suggests bad fit.
Pearson’s goodness of fit test
Specifying only estat gof
requests Pearson’s \(\chi^2\) goodness-of-fit test:
Goodness-of-fit test after logistic model
Variable: low
Number of observations = 189
Number of covariate patterns = 108
Pearson chi2(101) = 104.23
Prob > chi2 = 0.3929
Pearson’s \(\chi^2\) goodness-of-fit test procedure:
\[\chi^2 = \sum_k\frac{(O_k - E_k)^2}{E_k}\]
where \(k\) indexes a group defined by predictor patterns, and \(O_k\) and \(E_k\) are the observed and model-expected number of ones in that group, respectively.
If model predictions do not match the observed data well, \(\chi^2\) will be large and the p-value will be small.
Pearson’s statistic has an approximate \(\chi^2\) distribution when the number of observations per group is large enough (>=5).
With 108 groups and 180 total observations, most groups will only contain one observation, so Pearson’s statistic may not be \(\chi^2\)-distributed.
Hosmer and Lemeshow goodness of fit test
Hosmer and Lemeshow (1980) developed a similar \(\chi^2\) statistic based on fewer groups with more observations per group.
Adding the group()
option with the number of groups to
estat gof
requests the Hosmer and Lemeshow goodness of fit
test:
/* Hosmer and Lemeshow gof test with groups based on predicted probability deciles */
estat gof, group(10)
note: obs collapsed on 10 quantiles of estimated probabilities.
Goodness-of-fit test after logistic model
Variable: low
Number of observations = 189
Number of groups = 10
Hosmer–Lemeshow chi2(8) = 6.92
Prob > chi2 = 0.5456
Both goodness of fit tests suggest that the model has adequate fit to the data.
However, Allison(2014) has pointed out some problems with the Hosmer and Lemeshow test:
Nevertheless, Hosmer and Lemeshow’s test is commonly reported.
Allison(2014) discusses some newer goodness of fit tests with potentially better properties, but they would need to be calculated manually.
Often, we are interested in comparing different models’ fit to the data.
Information Criteria
Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) assess a model’s fit to the data but penalize for model complexity (number of parameters estimated).
Use estat ic
after running a regression model to request
both AIC and BIC:
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 189 -117.336 -104.9044 7 223.8088 246.501
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] IC note.
Lower values on both criteria indicate better fit to the data, accounting for complexity.
We compare our model to a model without continuous predictors
age
and ptl
, and save this reduced model’s
estimates with estimates store
, which we use below.
/* model without continuous predictors age and ptl*/
logit low i.ht i.smoke i.race, or
/* store model results as m_reduced */
estimates store m_reduced
/* AIC and BIC */
estat ic
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -108.34857
Iteration 2: Log likelihood = -108.17535
Iteration 3: Log likelihood = -108.17506
Iteration 4: Log likelihood = -108.17506
Logistic regression Number of obs = 189
LR chi2(4) = 18.32
Prob > chi2 = 0.0011
Log likelihood = -108.17506 Pseudo R2 = 0.0781
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
1.ht | 3.252896 2.02843 1.89 0.059 .9582541 11.0423
|
smoke |
Smoker | 3.094613 1.155269 3.03 0.002 1.488809 6.432408
|
race |
Black | 2.803324 1.393428 2.07 0.038 1.058211 7.426326
Other | 3.06541 1.240317 2.77 0.006 1.387004 6.774843
|
_cons | .1449379 .0525803 -5.32 0.000 .0711844 .2951068
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
m_reduced | 189 -117.336 -108.1751 5 226.3501 242.5588
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] IC note.
The model without age
and ptl
has a higher
AIC but a lower BIC than the full model, suggesting that the full model
fits the current data better but may be a bit overfit.
Likelihood ratio test
Likelihood ratio tests compare the fit of nested models and test the null hypothesis that the reduced model does not the data significantly worse.
We can compare two models saved with estimates store
via
a likelihood ratio test using the lrtest
command followed
by the names of the two saved models.
Below we compare the full and reduced model via likelihood ratio test.
/* refit full model */
logit low age ptl i.ht i.smoke i.race, or
/* store model results */
estimate store m_full
/* likelihood ratio test comparing full and reduced models */
lrtest m_full m_reduced
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.286
Iteration 2: Log likelihood = -104.90511
Iteration 3: Log likelihood = -104.90441
Iteration 4: Log likelihood = -104.90441
Logistic regression Number of obs = 189
LR chi2(6) = 24.86
Prob > chi2 = 0.0004
Log likelihood = -104.90441 Pseudo R2 = 0.1059
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9541789 .0337091 -1.33 0.184 .8903458 1.022589
ptl | 2.117295 .6982914 2.27 0.023 1.109308 4.041203
1.ht | 3.413581 2.132187 1.97 0.049 1.003537 11.61146
|
smoke |
Smoker | 2.645521 1.022067 2.52 0.012 1.240679 5.641093
|
race |
Black | 2.648104 1.327578 1.94 0.052 .9912907 7.074066
Other | 2.778709 1.164658 2.44 0.015 1.222007 6.318482
|
_cons | .3970093 .3540699 -1.04 0.300 .0691294 2.280021
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Likelihood-ratio test
Assumption: m_reduced nested within m_full
LR chi2(2) = 6.54
Prob > chi2 = 0.0380
A low p-value suggests the null should be rejected because the reduced model fits the data significantly worse.
Once we have estimated predicted probabilities, we can classify observations as 0 or 1 based on some cutoff, for example 0.5.
Predictive power measures assess the agreement between the observed zeroes and ones and the model-classified zeroes and ones.
Sensitivity and Specificity
Below, we create a variable that classifies observations with predicted probabilities above 0.5 as one and below 0.5 as zero, and then check its agreement with the observed outcome:
/* classify observations based on cutoff 0.5 */
gen pred_low = 0
replace pred_low = 1 if predp > 0.5
/* check agreement of observed and predicted outcomes, add row percentages */
tab low pred_low, row
(29 real changes made)
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
Birthweigh | pred_low
t<2500g | 0 1 | Total
-----------+----------------------+----------
0 | 117 13 | 130
| 90.00 10.00 | 100.00
-----------+----------------------+----------
1 | 43 16 | 59
| 72.88 27.12 | 100.00
-----------+----------------------+----------
Total | 160 29 | 189
| 84.66 15.34 | 100.00
In the table above:
low
are the rows and the predicted
classifications are the columnsTwo quantities we can calculate from the table above are sensitivity and specificity.
The model has good specificity at .9 but bad sensitivity at .27, meaning that it tends to classify observed zeroes correctly but tends to misclassify observed ones.
If correctly identifying ones is more important than identifying zeroes, we might want to lower our cutoff to be more permissive in classifying ones.
Below we change the cutoff probability to 0.25 and recalculate the observed vs predicted table:
/* classify observations based on cutoff 0.25 */
gen pred_low_25 = 0
replace pred_low_25 = 1 if predp > 0.25
/* check agreement of observed and predicted outcomes with new cutoff */
tab low pred_low_25, row
(121 real changes made)
+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
Birthweigh | pred_low_25
t<2500g | 0 1 | Total
-----------+----------------------+----------
0 | 57 73 | 130
| 43.85 56.15 | 100.00
-----------+----------------------+----------
1 | 11 48 | 59
| 18.64 81.36 | 100.00
-----------+----------------------+----------
Total | 68 121 | 189
| 35.98 64.02 | 100.00
Above, we see the push-pull of sensitivity and specificity.
When we lower the cutoff to increase sensitivity to .81 we also decrease specificity to .44.
The best cutoff choice depends on the relative costs of false positives and false negatives.
Receiver Operator Characteristic (ROC) curves
The most widely reported measure of logistic regression predictive power is the area under the curve (AUC) of a receiver operator characteristic (ROC) curve.
A graph of the ROC curve plots sensitivity against 1-specificity, or the true positive rate vs the false positive rate, across a range of different cutoffs.
The area under the ROC curve is an overall measure of its classification accuracy across different cutoffs.
Below is a figure displaying ROC curves for three different models run on the same data, but with different numbers of predictors:
In the graph above:
Stata’s lroc
command plots the ROC curve for a
logit
model and calculates the area under the curve.
Logistic model for low
Number of observations = 189
Area under ROC curve = 0.7077
file lroc.svg saved as SVG format
An AUC of .708 suggests our model is better than random guessing, but will still make a substantial number of classification errors.
The AUC equals another commonly reported measure of predictive power, the concordance or \(c\) statistic, which has a useful interpretation:
Validity of the logistic regression model estimates and subsequent inference depends on key assumptions:
Independence of observations
To assess the independence assumption, consider the study design:
If nonindependence is suspected, the following methods can be used to account for the nonindependence:
melogit
and
xtlogit, re
)xtlogit, fe
or clogit
)xtgee, family(binomial)
)Linearity in the logits: Pregibon’s link test
Logistic regression assumes that the predictors have a linear relationship with the logit of the outcome.
Practically, this means we have modeled the correct functional form of the relationship.
Violation of linearity suggests that additional non-linear terms should be added (e.g. polynomial terms, splines, interactions).
One method to assess linearity is to use Pregibon’s link test (Pregibon, 1980):
Iteration 0: Log likelihood = -117.336
Iteration 1: Log likelihood = -105.30024
Iteration 2: Log likelihood = -104.60155
Iteration 3: Log likelihood = -104.5903
Iteration 4: Log likelihood = -104.5903
Logistic regression Number of obs = 189
LR chi2(2) = 25.49
Prob > chi2 = 0.0000
Log likelihood = -104.5903 Pseudo R2 = 0.1086
------------------------------------------------------------------------------
low | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_hat | .7966206 .3323518 2.40 0.017 .1452231 1.448018
_hatsq | -.1584433 .2026237 -0.78 0.434 -.5555784 .2386918
_cons | .0237742 .2273172 0.10 0.917 -.4217593 .4693077
------------------------------------------------------------------------------
In the output above, _hatsq
represents the squared
predicted outcome, and its large p-value of .434 suggests no gross
violation of linearity.
Linearity in the logits: graphical assessment
We can also use a graphical method to assess linearity for a continous predictor:
margins
can predict logits (with option
predict(xb)
) and marginsplot
can plot
them/* create a categorical age variable based on age quintiles */
egen age_cat = cut(age), group(5)
/* enter categorical age in logit model as factor */
logit low i.age_cat ptl i.ht i.smoke i.race, or
/* predicted logits across age categories, predict(xb) requests logits */
margins age_cat, predict(xb)
/* graph predicted logits across age categories */
marginsplot
The above graph suggests some non-linearity, but the form does not seem biologically plausible so may be due to sampling variability.
It is important to assess the sensitivity of the model’s estimates to the inclusion or exclusion of a few influential observations.
A large change in model estimates after excluding some observations casts doubt on the robustness of the results.
Stata’s predict
can calculate 3 different influence
statistics after logistic regression.
Each influence statistic is calculated for each observation and measures how much some model statistic changes after omitting that observation, and all others that share the same predictor values, from the regression.
Use the following predict
options to request the
influence statistics:
dbeta
: measures change in the regression
coefficientsdx2
: measures change in Pearson’s \(\chi^2\) statisticddeviance
: measures change in deviance, which
quantifies the difference in the fit of the current model vs a model
that perfectly predicts the outcomeBelow assess the model’s sensitivity to influential observations with the following procedure:
dbeta
) influence statistic
with predict
/* calculate delta-beta influence statistic and save as variable dbeta */
predict dbeta, dbeta
/* sort data by dbeta, descending */
gsort -dbeta
/* list 10 observations with largest dbeta */
li dbeta in 1/10
| dbeta |
|----------|
1. | 1.007646 |
2. | 1.007646 |
3. | .6772961 |
4. | .577143 |
5. | .577143 |
|----------|
6. | .577143 |
7. | .577143 |
8. | .577143 |
9. | .577143 |
10. | .4954283 |
+----------+
Above we see 2 observations have dbeta values above 1, and then a significant drop in dbeta for the next observations.
We will rerun the logit
model without these 2
observations:
/* rerun model without 2 observations with dbeta > 1 */
logit low age ptl i.ht i.smoke i.race if dbeta < 1, or
Iteration 0: Log likelihood = -116.58273
Iteration 1: Log likelihood = -103.08216
Iteration 2: Log likelihood = -102.68944
Iteration 3: Log likelihood = -102.68783
Iteration 4: Log likelihood = -102.68783
Logistic regression Number of obs = 187
LR chi2(6) = 27.79
Prob > chi2 = 0.0001
Log likelihood = -102.68783 Pseudo R2 = 0.1192
------------------------------------------------------------------------------
low | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .9466164 .0340673 -1.52 0.127 .8821462 1.015798
ptl | 2.058488 .6825838 2.18 0.029 1.074718 3.942778
1.ht | 6.088608 4.463934 2.46 0.014 1.446937 25.62043
|
smoke |
Smoker | 2.870167 1.120249 2.70 0.007 1.3356 6.16791
|
race |
Black | 2.365727 1.203828 1.69 0.091 .8726108 6.4137
Other | 2.624106 1.106879 2.29 0.022 1.147981 5.998299
|
_cons | .4795685 .4324742 -0.81 0.415 .0818914 2.808427
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
The estimates and inferences have changed slightly (more so for the
ht
coefficient), but overall the model’s estimates are not
too sensitive to the inclusion of these 2 observations.
Unlike in linear regression, there is no unexplained error term \(\epsilon_i\) whose variance must be separately estimated.
The variance of binomially-distributed variables is determined by the probability parameter \(p\) (\(n\) is fixed in the data).
If there is unmodeled heterogeneity in \(p\), the variance of the data may be larger than expected by the model, known as overdispersion.
For example:
Methods to address overdispersion in logistic regression model:
Assess the linearity assumption of the logistic regression model run on the NHANES data:
linktest
Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, 28(2), 186-208. https://doi.org/10.1177/0049124199028002003
Allison, P. D. (2014, March). Measures of fit for logistic regression. In Proceedings of the SAS global forum 2014 conference (pp. 1-13). SAS Institute Inc. Cary, NC, USA.
Hosmer D.W., & S. Lemeshow (1980) A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
Norton, E. C., Dowd, B. E., Garrido, M. M., & Maciejewski, M. L. (2024). Requiem for odds ratios. Health services research, 59(4), e14337.
Pregibon, D. (1980). Goodness of link tests for generalized linear models. Journal of the Royal Statistical Society Series C: Applied Statistics, 29(1), 15-24.
VanderWeele, T. J., & Knol, M. J. (2014). A tutorial on interaction. Epidemiologic methods, 3(1), 33-72.
Williams, R. (2017). Using Stata’s margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 308-331