The margins command, new in Stata 11, can be a very useful tool in understanding and interpreting interactions. We will illustrate the command for a logistic regression model with two categorical by continuous interactions. We begin by loading the dataset mlogcatcon.
use https://stats.idre.ucla.edu/stat/data/mlogcatcon, clear
In this dataset y is the binary response variable and m and s are continuous predictors. The variable f, which stands for female, is a binary predictor. We will interact f with both m and s. Here is the logistic regression model.
logit y f##c.m f##c.s Iteration 0: log likelihood = -109.04953 ... Iteration 5: log likelihood = -69.533946 Logistic regression Number of obs = 200 LR chi2(5) = 79.03 Prob > chi2 = 0.0000 Log likelihood = -69.533946 Pseudo R2 = 0.3624 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | -11.64885 5.31706 -2.19 0.028 -22.0701 -1.227609 m | .0907151 .0348589 2.60 0.009 .022393 .1590372 | f#c.m | 1 | .0426602 .0577991 0.74 0.460 -.070624 .1559445 | s | .074839 .0330348 2.27 0.023 .010092 .1395859 | f#c.s | 1 | .1471401 .0725369 2.03 0.043 .0049704 .2893098 | _cons | -10.1233 2.323433 -4.36 0.000 -14.67714 -5.569451 ------------------------------------------------------------------------------
You will note that the f by s interaction is statistically significant while the f by m interaction is not. Since this is a nonlinear model we will have to take the values of all covariates into account in understanding what is going on in the model.
We will start with a margins command that looks at the discrete difference in probability between males and females for five different levels of s while holding m at its mean value. We get the discrete difference in probability using the dydx option with the binary predictor. The variable m will be held at its mean value using the atmeans option.
margins, dydx(f) at(s=(30(10)70)) atmeans noatlegend Conditional marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() dy/dx w.r.t. : 1.f ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | _at | 1 | -.042701 .0367882 -1.16 0.246 -.1148046 .0294026 2 | -.0839342 .0472826 -1.78 0.076 -.1766064 .0087379 3 | -.1419013 .0533704 -2.66 0.008 -.2465053 -.0372973 4 | -.1051027 .0980072 -1.07 0.284 -.2971932 .0869878 5 | .2145781 .2161073 0.99 0.321 -.2089845 .6381407 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
While the results of the margins command above are perfectly correct, they reflect the discrete change in probability for only a single value of m. If we remove the atmeans option we get the average marginal effect, i.e., the discrete change in probability for each of the values of s averaged across the observed values of m. Here is how the margins command looks now.
margins, dydx(f) at(s=(30(10)70)) noatlegend Average marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() dy/dx w.r.t. : 1.f ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | _at | 1 | -.0575153 .0497762 -1.16 0.248 -.1550748 .0400441 2 | -.1048622 .0581838 -1.80 0.072 -.2189004 .009176 3 | -.148558 .0594204 -2.50 0.012 -.2650197 -.0320962 4 | -.0726804 .0766543 -0.95 0.343 -.2229201 .0775593 5 | .1663325 .1894277 0.88 0.380 -.204939 .537604 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
Let’s go ahead and graph these results including the 95% confidence intervals. We will begin by placing placing necessary values into a matrix using techniques shown in Stata FAQ: How can I graph the results of the margins command?. The matrix commands will be followed by a twoway line graph.
matrix b=r(b)' matrix b=b[6...,1] matrix m=r(at) matrix m=m[1...,4] matrix v=r(V) matrix v=v[6...,6...] matrix se=vecdiag(cholesky(diag(vecdiag(v))))' matrix ll=b-1.96*se matrix ul=b+1.96*se matrix m=m,b,ll,ul matrix list m m[5,4] s r1 r1 r1 r1 30 -.05751532 -.15507658 .04004594 r2 40 -.10486222 -.21890253 .00917809 r3 50 -.14855797 -.26502187 -.03209408 r4 60 -.07268039 -.22292282 .07756203 r5 70 .16633254 -.20494579 .53761086 svmat m twoway line m2 m3 m4 m1, scheme(lean1) legend(off) yline(0) /// ytitle(discrete change in probability) xtitle(variable s) /// title(average marginal effect) name(ame, replace) drop m1-m4 /* drop these variables -- they are no longer needed */
The margins command and the graph above give us a pretty good idea of how the discrete change in probability varies across different values of s but we still don’t know how this changes with differing values of m. Let’s try the margins command once more, this time varying both s and m.
margins, dydx(f) at(s=(30(10)70) m=(30(10)70)) vsquish Conditional marginal effects Number of obs = 200 Model VCE : OIM Expression : Pr(y), predict() dy/dx w.r.t. : 1.f 1._at : m = 30 s = 30 2._at : m = 30 s = 40 3._at : m = 30 s = 50 4._at : m = 30 s = 60 5._at : m = 30 s = 70 6._at : m = 40 s = 30 7._at : m = 40 s = 40 8._at : m = 40 s = 50 9._at : m = 40 s = 60 10._at : m = 40 s = 70 11._at : m = 50 s = 30 12._at : m = 50 s = 40 13._at : m = 50 s = 50 14._at : m = 50 s = 60 15._at : m = 50 s = 70 16._at : m = 60 s = 30 17._at : m = 60 s = 40 18._at : m = 60 s = 50 19._at : m = 60 s = 60 20._at : m = 60 s = 70 21._at : m = 70 s = 30 22._at : m = 70 s = 40 23._at : m = 70 s = 50 24._at : m = 70 s = 60 25._at : m = 70 s = 70 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.f | _at | 1 | -.0057129 .0063763 -0.90 0.370 -.0182102 .0067844 2 | -.0118921 .0116175 -1.02 0.306 -.0346619 .0108777 3 | -.0238252 .0229579 -1.04 0.299 -.0688218 .0211714 4 | -.0400685 .0516074 -0.78 0.438 -.1412171 .0610802 5 | -.0062295 .1691796 -0.04 0.971 -.3378154 .3253565 6 | -.0140135 .0131539 -1.07 0.287 -.0397946 .0117675 7 | -.0287583 .0208235 -1.38 0.167 -.0695717 .012055 8 | -.0551503 .0356445 -1.55 0.122 -.1250121 .0147116 9 | -.0763916 .0804218 -0.95 0.342 -.2340155 .0812323 10 | .0676689 .2718875 0.25 0.803 -.4652207 .6005586 11 | -.0339307 .0292713 -1.16 0.246 -.0913013 .0234399 12 | -.0675498 .0389036 -1.74 0.083 -.1437994 .0086999 13 | -.1184831 .0480101 -2.47 0.014 -.2125811 -.024385 14 | -.1065329 .097911 -1.09 0.277 -.298435 .0853692 15 | .1934334 .2441036 0.79 0.428 -.2850009 .6718678 16 | -.07971 .0710032 -1.12 0.262 -.2188738 .0594538 17 | -.1487312 .0866721 -1.72 0.086 -.3186054 .021143 18 | -.2164891 .0908541 -2.38 0.017 -.3945598 -.0384185 19 | -.0634788 .110638 -0.57 0.566 -.2803254 .1533677 20 | .2182539 .1473218 1.48 0.138 -.0704916 .5069993 21 | -.1751863 .1667445 -1.05 0.293 -.5019995 .1516269 22 | -.2866478 .1890498 -1.52 0.129 -.6571787 .083883 23 | -.2841612 .2168233 -1.31 0.190 -.709127 .1408047 24 | .0354487 .1731142 0.20 0.838 -.3038489 .3747462 25 | .1446316 .1002235 1.44 0.149 -.0518029 .3410661 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level.
The first five rows give the discrete change for the five values of s while holding m at 30. The next five hold m at 40. And so on. One of the more interesting features is how few of the discrete changes are statistically significant even though the overall f by s interaction was significant.
Now we can collect the necessary values into a matrix in preparation for graphing.
matrix m=r(at) matrix m=m[1...,3..4] matrix b=r(b)' matrix b=b[26...,1] matrix m = m,b matrix colnames m = mvar svar coef matrix list m m[25,3] mvar svar coef r1 30 30 -.00042464 r2 30 40 -.00139674 r3 30 50 -.00455224 r4 30 60 -.01440282 r5 30 70 -.04155818 r6 40 30 -.0012454 r7 40 40 -.00406479 r8 40 50 -.01291961 r9 40 60 -.0377978 r10 40 70 -.08770375 r11 50 30 -.00362874 r12 50 40 -.011581 r13 50 50 -.03430757 r14 50 60 -.08224427 r15 50 70 -.1233095 r16 60 30 -.01037455 r17 60 40 -.03108193 r18 60 50 -.07678675 r19 60 60 -.12194728 r20 60 70 -.10037149 r21 70 30 -.02811225 r22 70 40 -.07139767 r23 70 50 -.11982559 r24 70 60 -.10522467 r25 70 70 -.05170914 svmat m, names(col)
Let’s begin by graphing the effect of different values of s with separate lines for each value of m.
twoway (line coef svar if mvar==30)(line coef svar if mvar==40)(line coef svar if mvar==50) /// (line coef svar if mvar==60)(line coef svar if mvar==70), scheme(lean1) /// legend(order ( 1 "m30" 2 "m40" 3 "m50" 4 "m60" 5 "m70")) /// name(sbym, replace) xtitle(variable s) ytitle(discrete change in probability)
Although there were not a lot of significant values in the margins table above, the lines for each of the values of m look rather different from one another. While the line for m equal 30 is rather flat the line for m equal 70 displays a lot more variability, first dropping and then climbing steeply around s equal 50.
Now that we know what differences in s for values of m looks like, we can reverse the variables in the graphics command (twoway line) to see what differences in m for values of s looks like.
twoway (line coef mvar if svar==30)(line coef mvar if svar==40)(line coef mvar if svar==50) /// (line coef mvar if svar==60)(line coef mvar if svar==70), scheme(lean1) /// legend(order ( 1 "s30" 2 "s40" 3 "s50" 4 "s60" 5 "s70")) /// name(sbym, replace) xtitle(variable m) ytitle(discrete change in probability)
Of course, we are looking at the same 25 values as the previous graph, just organized differently. This time the line for s equal 70 is the one that stands out from the others.
If your model is more complex than this one, you have to decide what to do with each of the covariates. You can hold them constant at one or more values or you can average over them. Whatever choice you make you need to realize that the values of all of the covariates matter in nonlinear models.