About this presentation
- This presentation presents a broad overview of methods for interpreting interactions in logistic regression.
- The presentation is not about Stata. It just uses Stata. You gotta use something.
- The methods shown are somewhat stat package independent. However, they can be easier or more difficult to implement depending on the stat package.
- Each of the models used in the examples will have two research variables that are interacted and one continuous covariate (cv1) that is not part of the interaction.
Some definitions
Odds
Showing that odds are ratios.
odds = p/(1 - p)
Log odds
Natural log of the odds, also known as a logit.
log odds = logit = log(p/(1 - p))
Odds ratio
Showing that odds ratios are actually ratios of ratios.
odds1 p1/(1 - p1) odds_ratio = ----- = ------------- odds2 p2/(1 - p2)
Computing odds ratio from logistic regression coefficient
odds_ratio = exp(b)
Computing probability from logistic regression coefficients
probability = exp(Xb)/(1 + exp(Xb))
Where Xb is the linear predictor.
About logistic regression
Logistic regression fits a maximum likelihood logit model. The model estimates conditional means in terms of logits (log odds). The logit model is a linear model in the log odds metric. Logistic regression results can be displayed as odds ratios or as probabilities. Probabilities are a nonlinear transformation of the log odds results.
In general, linear models have a number of advantages over nonlinear models and are easier to work with. For example, in linear models the slopes and/or differences in means do not change for differing values of a covariate. This is not necessarily the case for nonlinear models. The problem in logistic regression is that, even though the model is linear in log odds, many researchers feel that log odds are not a natural metric and are not easily interpreted.
Probability is a much more natural metric. However, the logit model is not linear when working in the probability metric. Thus, the predicted probabilities change as the values of a covariate change. In fact, the estimated probabilities depend on all variables in the model not just the variables in the interaction.
So what is a linear model? A linear model is linear in the betas (coefficients). By extension, a nonlinear model must be nonlinear in the betas. Below are three example of linear and nonlinear models.
First, is an example of a linear model and its graph.
Next we have an example of a nonlinear model and its graph. In this case its an exponential growth model.
Lastly we have another nonlinear model. This one shows the nonlinear transformation of log odds to probabilities.
Logistic regression transformations
This is an attempt to show the different types of transformations that can occur with logistic regression models.
probability / / / / / / odds ratios ----- log odds ------- odds
Logistic interactions are a complex concept
Common wisdom suggests that interactions involves exploring differences in differences. If the differences are not different then there is no interaction. But in logistic regression interaction is a more complex concept. Researchers need to decide on how to conceptualize the interaction. Is the interaction to be conceptualized in terms of log odds (logits) or odds ratios or probability? This decision can make a big difference. An interaction that is significant in log odds may not be significant in terms of difference in differences for probability. Or vice versa.
Model 1: Categorical by categorical interaction
Log odds metric — categorical by categorical interaction
Variables f and h are binary predictors, while cv1 is a continuous covariate.
use https://stats.idre.ucla.edu/stat/data/concon2, clear logit y01 f##h cv1, nolog Logistic regression Number of obs = 200 LR chi2(4) = 106.10 Prob > chi2 = 0.0000 Log likelihood = -78.74193 Pseudo R2 = 0.4025 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 2.996118 .7521524 3.98 0.000 1.521926 4.470309 1.h | 2.390911 .6608498 3.62 0.000 1.09567 3.686153 | f#h | 1 1 | -2.047755 .8807989 -2.32 0.020 -3.774089 -.3214213 | cv1 | .196476 .0328518 5.98 0.000 .1320876 .2608644 _cons | -11.86075 1.895828 -6.26 0.000 -15.5765 -8.144991 ------------------------------------------------------------------------------
The interaction term is clearly significant. We could manually compute the expected logits for each of the four cells in the model.
f h cell 0 0 b[_cons] = -11.86075 cell 0 1 b[_cons] + b[1.f] = -11.86075 + 2.390911 = -9.469835 cell 1 0 b[_cons] + b[1.h] = -11.86075 + 2.996118 = -8.864629 cell 1 1 b[_cons] + b[1.f] + b[1.h] + b[1.f#1.h] = -11.86075 + 2.390911 + 2.996118 - 2.047755 = -8.521473
We can also use a cell-means model to obtain the expected logits for each cell when cv1=0.
logit y01 bn.f#bn.h cv1, nocons Logistic regression Number of obs = 200 Wald chi2(5) = 50.48 Log likelihood = -78.74193 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f#h | 0 0 | -11.86075 1.895828 -6.26 0.000 -15.5765 -8.144991 0 1 | -9.469835 1.714828 -5.52 0.000 -12.83084 -6.108835 1 0 | -8.864629 1.530269 -5.79 0.000 -11.8639 -5.865356 1 1 | -8.521473 1.640705 -5.19 0.000 -11.73719 -5.30575 | cv1 | .196476 .0328518 5.98 0.000 .1320876 .2608644 ------------------------------------------------------------------------------
And here is what the expected logits look like in a 2×2 table.
f=0-11.86075-9.469835
h=0 h=1 f=1 -8.8646295 -8.521473
We will look at the differences between h0 and h1 at each level of f (simple main effects) and also at the difference in differences.
/* difference 1 at f = 0 */ lincom 0.f#0.h - 0.f#1.h ( 1) [y01]0bn.f#0bn.h - [y01]0bn.f#1.h = 0 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -2.390911 .6608498 -3.62 0.000 -3.686153 -1.09567 ------------------------------------------------------------------------------ /* difference 2 at f = 1 */ lincom 1.f#0.h - 1.f#1.h ( 1) [y01]1.f#0bn.h - [y01]1.f#1.h = 0 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -.3431562 .5507722 -0.62 0.533 -1.42265 .7363375 ------------------------------------------------------------------------------
Difference 1 suggests that h0 is significantly different from h1 at f = 0, While difference 2 does not show a significant difference at f = 1. These are tests of simple main effects just like we would do in OLS regression. We will finish up this section by looking at the difference in differences.
/* difference in differences */ lincom (0.f#0.h - 0.f#1.h)-(1.f#0.h - 1.f#1.h) ( 1) [y01]0bn.f#0bn.h - [y01]0bn.f#1.h - [y01]1.f#0bn.h + [y01]1.f#1.h = 0 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -2.047755 .8807989 -2.32 0.020 -3.774089 -.3214213 ------------------------------------------------------------------------------
The difference in differences is, of course, just another name for the interaction. For the log odds model the differences and the difference in differences are the same regardless of the value of the covariate. This constancy across different values of the covariate is one of the properties of linear models.
Odds ratio metric — categorical by categorical interaction
Let’s look at a table of logistic regression coefficients along with the exponentiated coefficients, which some people call odds ratios.
---------------------------------------------------------- source | coefficient exp(coef) type of exp(coef) --------+------------------------------------------------- f | 2.996118 20.007716 odds ratio h | 2.390911 10.92345 odds ratio f#h | -2.047755 0.1290242 ratio of odds ratios cv1 | 0.196476 1.217106 odds ratio _cons | -11.86075 7.062e-06 baseline odds ---------------------------------------------------------
Many people call all exponentiated logistic coefficients odds ratios. But as you can see from the table above, exponentiating the interaction is a ratio of ratios and the exponentiated constant is the baseline odds.
And here’s how we get the odds ratio results.
logit y01 f##h cv1, or nolog Logistic regression Number of obs = 200 LR chi2(4) = 106.10 Prob > chi2 = 0.0000 Log likelihood = -78.74193 Pseudo R2 = 0.4025 ------------------------------------------------------------------------------ y01 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 20.00771 15.04885 3.98 0.000 4.58104 87.38374 1.h | 10.92345 7.218757 3.62 0.000 2.991185 39.8911 | f#h | 1 1 | .1290242 .1136444 -2.32 0.020 .022958 .7251177 | cv1 | 1.217106 .0399841 5.98 0.000 1.141208 1.298052 _cons | 7.06e-06 .0000134 -6.26 0.000 1.72e-07 .0002902 ------------------------------------------------------------------------------
We can compute the odds ratios manually for each of the two levels of f from the values in the table above.
odds ratio h1/h0 for f=0: b[1.h] = 10.92345 odds ratio h1/h0 for f=1: b[1.h]*b[f#h] = 10.92345*.1290242 = 1.4093894
Please note that the the computation of the odds ratio for f =1 involves multiplying coefficients for the odds ratio model above which implies that odds ratio models are multiplicative rather than additive.
The baseline odds when cv1 = zero is very small (7.06e-06) so for the remainder of of the computations we will estimate the odds while holding cv1 at 50.
margins f#h, at(cv1=50) expression(exp(xb())) noatlegend Predictive margins Number of obs = 200 Model VCE : OIM Expression : exp(xb()) over : f h ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f h | 0 0 | .1304264 .0734908 1.77 0.076 -.0136129 .2744657 0 1 | 1.424706 .515989 2.76 0.006 .4133857 2.436025 1 0 | 2.609533 1.136545 2.30 0.022 .3819457 4.837121 1 1 | 3.677847 1.311463 2.80 0.005 1.107427 6.248267 ------------------------------------------------------------------------------
The option expression(exp(xb())) insures that we are looking at results in the odds ratio metric. The baseline odds are now .1304264 which is reasonable. We will compute the odds ratio for each level of f.
odds ratio 1 at f=0: 1.424706/.1304264 = 10.923446 odds ratio 2 at f=1: 3.677847/2.609533 = 1.4093889
So when f = 0 the odds of the outcome being one are 10.92 times greater for h1 then for h0. For f = 1 the ratio of the two odds is only 1.41. These odds ratios are the same as we computed manually earlier.
We can also compute the ratio of odds ratios and show that it reproduces the estimate for the interaction.
ratio of odds ratios: (3.677847/2.609533)/(1.424706/.1304264) = .1290242
The one nice thing that we can say about working in odds ratio metric is the odds ratios remain the same regardless of where we hold the covariate constant.
Probability metric — categorical by categorical interaction
We will begin by rerunning our logistic regression model to refresh our memories on the coefficients.
logit y01 f##h cv1, nolog Logistic regression Number of obs = 200 LR chi2(4) = 106.10 Prob > chi2 = 0.0000 Log likelihood = -78.74193 Pseudo R2 = 0.4025 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 2.996118 .7521524 3.98 0.000 1.521926 4.470309 1.h | 2.390911 .6608498 3.62 0.000 1.09567 3.686153 | f#h | 1 1 | -2.047755 .8807989 -2.32 0.020 -3.774089 -.3214213 | cv1 | .196476 .0328518 5.98 0.000 .1320876 .2608644 _cons | -11.86075 1.895828 -6.26 0.000 -15.5765 -8.144991 ------------------------------------------------------------------------------
Let’s manually compute the probability of the outcome being one for the f = 0, h = 0 cell when cv1 is held at 50.
Xb = b[_cons] + 0*b[1.f] + 0*b[1.h] + 0*b{f#h} + 50*b[cv1] = -11.86075 + 0*2.996118 + 0*2.390911 + 0*-2.047755 + 50*.196476 = -2.03695 probability = exp(Xb)/(1+exp(Xb)) = exp(-2.03695)/(1+exp(-2.03695)) = .11537767
We could repeat this for each of the other three cells but instead we we will obtain the expected probabilities for each cell while holding the covariate at 50 using the margins command.
margins f#h, at(cv1=50) Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(y01), predict() at : cv1 = 50 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f#h | 0 0 | .115378 .0575106 2.01 0.045 .0026592 .2280968 0 1 | .5875788 .0877652 6.69 0.000 .4155621 .7595955 1 0 | .7229559 .0872338 8.29 0.000 .5519808 .8939309 1 1 | .7862264 .0599327 13.12 0.000 .6687605 .9036924 ------------------------------------------------------------------------------
Here are the same results displayed as a table.
f=0.115378.5875788
h=0 h=1 f=1 .7229559 .7862264
We would like to look at the differences in h for each level of f.
h1 - h0 at f = 0: .5875788 - .115378 = .4722008 h1 - h0 at f = 1: .7862264 - .7229559 = .0632706
We can also do this with a slight variation of the margins command and get estimates of the differences in probability along with standard errors and confidence intervals.
margins f, dydx(h) at(cv1=50) post Conditional marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y01), predict() dy/dx w.r.t. : 0.h 1.h at : cv1 = 50 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.h | f | 0 | .4722008 .1035128 4.56 0.000 .2693195 .675082 1 | .0632706 .1036697 0.61 0.542 -.1399183 .2664595 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
These two differences are the probability analogs to the simple main effects from the log odds model. So, when the covariate is held at 50 there is a significant difference in h at f = 0 but not at f = 1.
Next, we will use lincom to compute the difference in differences when cv1 is held at 50.
lincom [1.h]0.f-[1.h]1.f ( 1) [1.h]0bn.f - [1.h]1.f = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .4089302 .1482533 2.76 0.006 .118359 .6995014 ------------------------------------------------------------------------------
The p-value here is different form the p-value from the original logit model because in the probability metric the values of the covariate matter.
If we repeat the above process for values of cv1 from 20 to 70, we can produce a table of simple main effects and a graph of the difference in differences.
* Table of Simple Main Effects for h at Two Levels of f for Various Values of cv1 margins f, dydx(h) at(cv1=(20(10)70)) noatlegend | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cv1 f | 20 0 | .0035507 .0038256 0.93 0.353 -.0039472 .0110487 20 1 | .002893 .0057719 0.50 0.616 -.0084197 .0142058 30 0 | .0246805 .0188412 1.31 0.190 -.0122475 .0616086 30 1 | .0186252 .0331697 0.56 0.574 -.0463863 .0836367 40 0 | .1485222 .0656193 2.26 0.024 .0199107 .2771337 40 1 | .0723494 .1167547 0.62 0.535 -.1564856 .3011843 50 0 | .4722008 .1035128 4.56 0.000 .2693195 .675082 50 1 | .0632706 .1036697 0.61 0.542 -.1399183 .2664595 60 0 | .4284548 .137549 3.11 0.002 .1588636 .6980459 60 1 | .0142654 .0255894 0.56 0.577 -.0358888 .0644197 70 0 | .1173445 .076704 1.53 0.126 -.0329926 .2676816 70 1 | .0021597 .0042758 0.51 0.613 -.0062207 .0105402
* Table of Difference in Differences for Various Values of cv1 margins r.f#r.h, at(cv1=(20(10)70)) noatlegend | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] ------------+---------------------------------------------------------------- cv1 | 20 | -.0006577 .0047463 0.14 0.890 -.0086449 .0099603 30 | -.0060553 .0306291 0.20 0.843 -.0539766 .0660872 40 | -.0761728 .1233778 0.62 0.537 -.1656432 .3179889 50 | -.4089302 .1482533 2.76 0.006 .118359 .6995014 60 | -.4141893 .1388141 2.98 0.003 .1421186 .68626 70 | -.1151848 .0753487 1.53 0.126 -.0324959 .2628654
marginsplot, x(cv1) recast(line) recastci(rarea) yline(0)
Clearly, the value of the covariate makes a huge difference in whether or not the simple main effects or the interactions are statistically significant when working in the probability metric.
Model 1a: Categorical by categorical interaction?
But wait, what if the model does not contain an interaction term? Consider the following model.
logit y01 i.f i.h cv1 Logistic regression Number of obs = 200 LR chi2(3) = 100.26 Prob > chi2 = 0.0000 Log likelihood = -81.6618 Pseudo R2 = 0.3804 ------------------------------------------------------------------------------ y01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 1.65172 .4229992 3.90 0.000 .8226566 2.480783 1.h | 1.256555 .4009757 3.13 0.002 .4706575 2.042453 cv1 | .1806214 .0304036 5.94 0.000 .1210314 .2402113 _cons | -10.26943 1.622842 -6.33 0.000 -13.45015 -7.088723 ------------------------------------------------------------------------------
We will manually compute the expected log odds for each of the four cells of the model.
f h cell 0 0 b[_cons] = -10.26943 cell 0 1 b[_cons] + b[1.f] = -10.26943 + 1.65172 = -8.61771 cell 1 0 b[_cons] + b[1.h] = -10.26943 + 1.256555 = -9.012875 cell 1 1 b[_cons] + b[1.f] + b[1.h] = -10.26943 + 1.65172 + 1.256555 = -7.361155
Next we will compute the differences for f=0 and f=1.
difference 1 at f = 0: -10.26943 - -8.6177 = -1.65173 difference 2 at f = 1: 9.012875 - -7.361155 = -1.65172
They are identical to within rounding error, showing that there is no interaction effect in the log odds model.
Next we will compute the expected probabilities for cv1 held at 50 along with the difference in differences.
margins, over(f h) at(cv1=50) post Predictive margins Number of obs = 200 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f#h | 0 0 | .2247204 .0670438 3.35 0.001 .0933171 .3561238 0 1 | .5045471 .0798579 6.32 0.000 .3480285 .6610657 1 0 | .6018917 .0866773 6.94 0.000 .4320073 .7717761 1 1 | .8415636 .0455686 18.47 0.000 .7522509 .9308764 ------------------------------------------------------------------------------ lincom (_b[0.f#1.h]-_b[0.f#0.h])-(_b[1.f#1.h]-_b[1.f#0.h]) ( 1) - 0bn.f#0bn.h + 0bn.f#1.h + 1.f#0bn.h - 1.f#1.h = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .0401547 .0364121 1.10 0.270 -.0312117 .111521 ------------------------------------------------------------------------------
The difference in differences is not very large. Let’s try in again, this time holding cv1 at 60.
margins, over(f h) at(cv1=60) post Predictive margins Number of obs = 200 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f#h | 0 0 | .6382663 .1046912 6.10 0.000 .4330753 .8434572 0 1 | .8610935 .0455552 18.90 0.000 .7718069 .9503802 1 0 | .9019929 .0470231 19.18 0.000 .8098294 .9941565 1 1 | .9700007 .0146765 66.09 0.000 .9412353 .998766 ------------------------------------------------------------------------------ lincom (_b[0.f#1.h]-_b[0.f#0.h])-(_b[1.f#1.h]-_b[1.f#0.h]) ( 1) - 0bn.f#0bn.h + 0bn.f#1.h + 1.f#0bn.h - 1.f#1.h = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .1548195 .0634635 2.44 0.015 .0304334 .2792057 ------------------------------------------------------------------------------
This time the difference in differences is much larger. Let’s make a graph similar to the one we did for the model with the interaction included.
We see that, even without an interaction term in the model, the differences in differences (interactions?) can vary widely from negative to positive depending on the value of the covariate.
This leads us to the “Quote of the Day”.
Quote of the day
Departures from additivity imply the presence of interaction types, but additivity does not imply the absence of interaction types.
Greenland & Rothman, 1998
Model 2: Categorical by continuous interaction
Log Odds Metric — categorical by continuous interaction
The dataset for the categorical by continuous interaction has one binary predictor (f), one continuous predictor (s) and a continuous covariate (cv1). Let’s take a look at the logistic regression model.
use https://stats.idre.ucla.edu/stat/data/logitcatcon, clear logit y f##c.s cv1 Logistic regression Number of obs = 200 LR chi2(4) = 114.41 Prob > chi2 = 0.0000 Log likelihood = -74.587842 Pseudo R2 = 0.4340 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 9.983662 3.05269 3.27 0.001 4.0005 15.96682 s | .1750686 .0470033 3.72 0.000 .0829438 .2671933 | f#c.s | 1 | -.1595233 .0570352 -2.80 0.005 -.2713103 -.0477363 | cv1 | .1877164 .0347888 5.40 0.000 .1195316 .2559013 _cons | -19.00557 3.371064 -5.64 0.000 -25.61273 -12.39841 ------------------------------------------------------------------------------
The interaction term is significant indicating the the slopes for y on s are significantly different for each level of f. We can compute the slopes and intercepts manually as shown below.
slope for f=0: b[s] = .1750686 slope for f=1: b[s] + b[f#c.s] = .1750686 -.1595233 = .0155453 intercept for f=0: _cons = -19.00557 intercept for f=0: _cons + b[1.f]= -19.00557 + 9.983662 = -9.021909
Here are our two logistic regression equations in the log odds metric.
-19.00557 + .1750686*s + 0*cv1 -9.021909 + .0155453*s + 0*cv1
Now we can graph these two regression lines to get an idea of what is going on.
Because the logistic regress model is linear in log odds, the predicted slopes do not change with differing values of the covariate.
Probability metric — categorical by continuous interaction
We’ll begin by rerunning the logistic regression model.
logit y f##c.s cv1 Logistic regression Number of obs = 200 LR chi2(4) = 114.41 Prob > chi2 = 0.0000 Log likelihood = -74.587842 Pseudo R2 = 0.4340 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | 9.983662 3.05269 3.27 0.001 4.0005 15.96682 s | .1750686 .0470033 3.72 0.000 .0829438 .2671933 | f#c.s | 1 | -.1595233 .0570352 -2.80 0.005 -.2713103 -.0477363 | cv1 | .1877164 .0347888 5.40 0.000 .1195316 .2559013 _cons | -19.00557 3.371064 -5.64 0.000 -25.61273 -12.39841 ------------------------------------------------------------------------------
If we were so inclined we could compute all of the probabilities of interest using the basic probability formula.
Prob = exp(Xb)/(1+exp(Xb))
Here’s an example of computing the probability when f = 0, s = 60, f#s = 0, and cv1 =40.
Xb0 = -19.00557 + 0*9.983662 + 60*.1750686 + 0*-.1595233 + 40*.1877164 = -.992798 exp(Xb0)/(1+exp(Xb0)) = exp(-.992798)/(1+exp(-.992798)) = .27035977
Now we will use f = 1, s = 60, f#s = 60, and cv1 =40.
Xb1 = -19.00557 + 1*9.983662 + 60*.1750686 + 60*-.1595233 + 40*.1877164 = -.580534 exp(Xb1)/(1+exp(Xb1)) = exp(-.580534)/(1+exp(-.580534)) = .35880973
We can also compute the difference in probabilities.
exp(Xb1)/(1+exp(Xb1)) - exp(Xb0)/(1+exp(Xb0)) = exp(-.580534)/(1+exp(-.580534)) - exp(-.992798)/(1+exp(-.992798)) = .08844995
If we use something like Stata’s margins command, we can get predicted probabilities along with standard errors and confidence intervals. Here is an example predicting the probability when s = 20 and cv1 = 40.
margins f, at(s=20 cv1=40) Adjusted predictions Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f | 0 | .0003368 .0005779 0.58 0.560 -.0007958 .0014695 1 | .2310582 .1500289 1.54 0.124 -.0629931 .5251095 ------------------------------------------------------------------------------
Now can repeat this for various values of s running from 20 to 70, producing the table below.
* Table of Predicted Probabilities of f for Various Values of s Holding cv1 at 40 margins f, at(s=(20(5)70) cv1=40) | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s f | 20 0 | .0003368 .0005779 0.58 0.560 -.0007958 .0014695 20 1 | .2310582 .1500289 1.54 0.124 -.0629931 .5251095 25 0 | .000808 .0012067 0.67 0.503 -.0015571 .003173 25 1 | .2451555 .1320954 1.86 0.063 -.0137469 .5040578 30 0 | .0019367 .0024706 0.78 0.433 -.0029056 .0067789 30 1 | .2598222 .1136085 2.29 0.022 .0371536 .4824908 35 0 | .0046348 .0049337 0.94 0.348 -.005035 .0143047 35 1 | .2750467 .0959104 2.87 0.004 .0870657 .4630276 40 0 | .0110505 .0095531 1.16 0.247 -.0076733 .0297743 40 1 | .2908127 .081642 3.56 0.000 .1307973 .4508282 45 0 | .0261139 .0178944 1.46 0.144 -.0089585 .0611863 45 1 | .3070997 .0752299 4.08 0.000 .1596518 .4545475 50 0 | .0604557 .0329478 1.83 0.067 -.0041208 .1250322 50 1 | .3238822 .0808248 4.01 0.000 .1654685 .4822959 55 0 | .1337569 .0622149 2.15 0.032 .0118178 .2556959 55 1 | .3411303 .0980782 3.48 0.001 .1489005 .5333601 60 0 | .2703596 .1168105 2.31 0.021 .0414151 .499304 60 1 | .3588096 .1233704 2.91 0.004 .117008 .6006111 65 0 | .4706697 .180248 2.61 0.009 .11739 .8239493 65 1 | .3768809 .1535731 2.45 0.014 .0758831 .6778787 70 0 | .6808947 .1951477 3.49 0.000 .2984123 1.063377 70 1 | .3953013 .1867987 2.12 0.034 .0291827 .7614199 ------------------------------------------------------------------------------
We will repeat this holding cv1 at 50 and then 60. We will then plot the probabilities for each of the three values of cv1.
Instead of looking at separate values for f0 and f1, we could compute the difference in probabilities. Here is an example using margins with the dydx option.
margins, dydx(f) at(s=20 cv1=40) Conditional marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() dy/dx w.r.t. : 1.f at : s = 20 cv1 = 40 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | .2307214 .150045 1.54 0.124 -.0633615 .5248042 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
Okay, let’s repeat this for different values of s, producing the table below.
* Table of Differences in Probability for Various Values of s Holding cv1 at 40 margins, dydx(f) at(s=(20(5)70) cv1=40) | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s | 20 | .2307214 .150045 1.54 0.124 -.0633615 .5248042 25 | .2443475 .1321009 1.85 0.064 -.0145655 .5032605 30 | .2578855 .1135271 2.27 0.023 .0353765 .4803946 35 | .2704118 .0954463 2.83 0.005 .0833405 .4574832 40 | .2797622 .0798258 3.50 0.000 .1233066 .4362179 45 | .2809858 .0696338 4.04 0.000 .1445061 .4174655 50 | .2634265 .0682395 3.86 0.000 .1296795 .3971735 55 | .2073734 .0822883 2.52 0.012 .0460913 .3686556 60 | .08845 .1291224 0.69 0.493 -.1646253 .3415252 65 | -.0937888 .2006804 -0.47 0.640 -.4871151 .2995376 70 | -.2855934 .2436296 -1.17 0.241 -.7630986 .1919118 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
Next, we need to repeat the process while holding cv1 at 50 and then 60. Then we can plot the differences in probabilities for the three values of cv1 on a single graph.
marginsplot, x(s) recast(line) noci legend(order(2 "cv1=40" 4 "cv1=50" 6 "cv1=60"))
The Stata FAQ page, How can I understand a categorical by continuous interaction in logistic regression? shows an alternative method for graphing these difference in probability lines to include confidence intervals. Here are the graphs from that FAQ page.
Model 3: Continuous by continuous interaction
Log odds metric — continuous by continuous interaction
This time we have a dataset that has two continuous predictors (r & m) and a continuous covariate (cv1).
use https://stats.idre.ucla.edu/stat/data/logitconcon, clear logit y c.r##c.m cv1, nolog Logistic regression Number of obs = 200 LR chi2(4) = 66.80 Prob > chi2 = 0.0000 Log likelihood = -77.953857 Pseudo R2 = 0.3000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- r | .4342063 .1961642 2.21 0.027 .0497316 .8186809 m | .5104617 .2011856 2.54 0.011 .1161452 .9047782 | c.r#c.m | -.0068144 .0033337 -2.04 0.041 -.0133483 -.0002805 | cv1 | .0309685 .0271748 1.14 0.254 -.0222931 .08423 _cons | -34.09122 11.73402 -2.91 0.004 -57.08947 -11.09297 ------------------------------------------------------------------------------
The trick to interpreting continuous by continuous interactions is to fix one predictor at a given value and to vary the other predictor. Once again, since the log odds model is a linear model it really doesn’t matter what value the covariate is held at; the slopes do not change. For convenience we will just hold cv1 at zero.
Here is an example manual computation of the slope of r holding m at 30.
slope = b[r] + 30*b[r#m] = .43420626 + 30*(-.00681441) = .22977396
Here is the same computation using Stata.
margins, dydx(r) at(m=30) predict(xb) Average marginal effects Number of obs = 200 Model VCE : OIM Expression : Linear prediction, predict(xb) dy/dx w.r.t. : r at : m = 30 cv1 = 0 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- r | .2297741 .0982943 2.34 0.019 .0371207 .4224274 ------------------------------------------------------------------------------
The table below shows the slope for r for various values of m running from 30 to 70. Since this is a linear model we do not have to hold cv1 at any particular value.
* Table of Slopes for r for Various Values of m margins, dydx(r) at(m=(30(10)70)) predict(xb) | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- m | 30 | .2297741 .0982943 2.34 0.019 .0371207 .4224274 40 | .16163 .0670895 2.41 0.016 .0301369 .2931231 50 | .0934859 .0395342 2.36 0.018 .0160004 .1709715 60 | .0253419 .0291137 0.87 0.384 -.0317199 .0824037 70 | -.0428022 .0485281 -0.88 0.378 -.1379156 .0523112 ------------------------------------------------------------------------------
We arbitrarily chose to vary m and look at the slope of r but we could have easily reversed the variables. Hopefully, your knowledge of the theory behind the model along with substantive knowledge will suggest which variable to manipulate.
Below is a graph of the slopes from the table above.
This time we are going to move directly to the probability interpretation by-passing the odds ratio metric.
Probability metric — continuous by continuous interaction
We will rerun our model.
logit y c.r##c.m cv1, nolog Logistic regression Number of obs = 200 LR chi2(4) = 66.80 Prob > chi2 = 0.0000 Log likelihood = -77.953857 Pseudo R2 = 0.3000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- r | .4342063 .1961642 2.21 0.027 .0497316 .8186809 m | .5104617 .2011856 2.54 0.011 .1161452 .9047782 | c.r#c.m | -.0068144 .0033337 -2.04 0.041 -.0133483 -.0002805 | cv1 | .0309685 .0271748 1.14 0.254 -.0222931 .08423 _cons | -34.09122 11.73402 -2.91 0.004 -57.08947 -11.09297 ------------------------------------------------------------------------------
Next we will calculate the values of the covariate for the mean minus one standard deviation, the mean, and the mean plus one standard deviation.
summarize cv1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cv1 | 200 52.405 10.73579 26 71 mean cv1 - 1sd = 41.669207 mean cv1 = 52.405 mean cv1 + 1sd = 63.140793
Here is an example of a computation for the slope of r in the probability metric for m = 30 hold cv1 at its mean minus 1 sd (41.669207).
margins, dydx(r) at(m=30 cv1=41.669207) Average marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() dy/dx w.r.t. : r at : m = 30 cv1 = 41.66921 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- r | .0061133 .0065712 0.93 0.352 -.006766 .0189926 ------------------------------------------------------------------------------
We will now compute the slopes for r for differing values of m for each of the three values of cv1.
Table for Slope of r for Various Values of m holding cv1 at mean minus 1 sd | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- m | 30 | .0061133 .0065712 0.93 0.352 -.006766 .0189926 35 | .006587 .0061377 1.07 0.283 -.0054427 .0186167 40 | .0071815 .0056839 1.26 0.206 -.0039586 .0183217 45 | .0078851 .0052656 1.50 0.134 -.0024354 .0182055 50 | .0085235 .004981 1.71 0.087 -.0012391 .0182861 55 | .0083341 .0049614 1.68 0.093 -.0013901 .0180583 60 | .0052692 .0059747 0.88 0.378 -.0064411 .0169795 65 | -.002175 .0090427 -0.24 0.810 -.0198984 .0155484 70 | -.0091967 .0089699 -1.03 0.305 -.0267774 .0083839 ------------------------------------------------------------------------------ Table for Slope of r for Various Values of m holding cv1 at the mean -------------+---------------------------------------------------------------- 30 | .0074917 .0069416 1.08 0.280 -.0061135 .0210969 35 | .0081075 .0063953 1.27 0.205 -.004427 .0206421 40 | .0088605 .0057648 1.54 0.124 -.0024384 .0201593 45 | .009721 .0051157 1.90 0.057 -.0003056 .0197476 50 | .0104242 .0046175 2.26 0.024 .0013739 .0194744 55 | .00992 .0046688 2.12 0.034 .0007692 .0190708 60 | .0058498 .006339 0.92 0.356 -.0065745 .0182741 65 | -.0021432 .0088189 -0.24 0.808 -.019428 .0151416 70 | -.0081533 .0075364 -1.08 0.279 -.0229243 .0066177 ------------------------------------------------------------------------------ Table for Slope of r for Various Values of m holding cv1 at mean plus 1 sd -------------+---------------------------------------------------------------- m | 30 | .0090189 .0073769 1.22 0.221 -.0054396 .0234774 35 | .0097902 .0067546 1.45 0.147 -.0034485 .0230289 40 | .0107094 .0060155 1.78 0.075 -.0010807 .0224994 45 | .0117184 .0052384 2.24 0.025 .0014513 .0219854 50 | .0124196 .0046088 2.69 0.007 .0033864 .0214527 55 | .0114027 .004686 2.43 0.015 .0022182 .0205871 60 | .006181 .0067253 0.92 0.358 -.0070003 .0193622 65 | -.0020011 .0080879 -0.25 0.805 -.0178531 .0138509 70 | -.0069432 .0060361 -1.15 0.250 -.0187739 .0048874
We will graph each of the three tables above.
The bottom line
- Just because the interaction term is significant in the log odds model, it doesn’t mean that the probability difference in differences will be significant for values of the covariate of interest.
- Paradoxically, even if the interaction term is not significant in the log odds model, the probability difference in differences may be significant for some values of the covariate.
- In the probability metric the values of all the variables in the model matter.
References
Ai, C.R. and Norton E.C. 2003. Interaction terms in logit and probit models. Economics Letters 80(1): 123-129.
Greenland, S. and Rothman, K.J. 1998. Modern Epidemiology, 2nd Ed. Philadelphia: Lippincott Williams and Wilkins.
Mitchell, M.N. and Chen X. 2005. Visualizing main effects and interactions for binary logit model. Stata Journal 5(1): 64-82.
Norton, E.C., Wang, H., and Ai, C. 2004 Computing interaction effects and standard errors in logit and probit models. Stata Journal 4(2): 154-167.
Comma separated data files
Categorical by categorical: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/concon2.csv
Categorical by continuous: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/logitcatcon.csv
Continuous by continuous: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/logitconcon.csv