The margins command introduced in Stata 11 is a very popular post-estimation command. However, it can be tricky to use in conjunction with multiple imputation and survey data.
Let’s begin by looking at the data.
use https://stats.idre.ucla.edu/stat/data/hsbmar, clear sum honors female prog read math science socst, sep(0) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- honors | 200 .265 .4424407 0 1 female | 185 .5459459 .4992356 0 1 prog | 200 2.025 .6904772 1 3 read | 185 51.61622 10.19104 28 76 math | 190 52.17895 9.246168 33 75 science | 193 51.57513 9.86396 26 74 socst | 188 51.59043 10.44862 26 71
As you can see from the table above, all of the variables except for honors and prog have missing values.
honors is the binary response variable while female (two level categorical) and prog (three level categorical) are the research variables of interest with read, math, science and socst serving as control variables. Our primary interest is in the female-by-prog interaction. We will want to compute the predicted probabilities for each of the six cells of the 2-by-3 interaction.
So, what’s the big deal?
Why not just impute the data and then run the margins command. Well, we can impute the data, but we need a way to run both svy logit and margins on each imputed dataset and then combine the margins results into a single output. The issue is that margins does not work with mi estimate.
We can accomplish this by writing a wrapper program called mimargins and saving it in a file called mimargins.ado. It contains both the svy logit and margins commands. By setting the option properties to mi, mimargins can be used with mi estimate. We also need to declare mimargins to be an eclass program.
Here is what the mimargins program looks like.
program mimargins, eclass properties(mi) version 12 svy: logit honors i.female##i.prog read math science socst margins female#prog, atmeans asbalanced post end
Here is how you use mimargins in the calling program.
mi estimate, cmdok: mimargins 1
The cmdok is needed because Stata does not recognize mimargins as an mi estimable program.
Next, we need to note that our data are not truly survey data. We are going to fake this by declaring that the values of write are the pweights and that ses is the stratification variable. Since this is part of a multiple imputation we need to run the survey set command as mi svyset. Here is the code for performing the multiple imputation using chained equations creating 10 imputed datasets. Note, the value 10 for the number of imputed datasets was selected for demonstration purposes and does not represent a recommendation.
set seed 1234543 mi set mlong mi register imputed female math read science socst mi svyset [pw=write], strata(ses) mi impute chain (logit) female (regress) math read science socst = /// write awards, add(10) Conditional models: science: regress science math socst i.female read write awards math: regress math science socst i.female read write awards socst: regress socst science math i.female read write awards female: logit female science math socst read write awards read: regress read science math socst i.female write awards Performing chained iterations ... Multivariate imputation Imputations = 10 Chained equations added = 10 Imputed: m=1 through m=10 updated = 0 Initialization: monotone Iterations = 100 burn-in = 10 female: logistic regression math: linear regression read: linear regression science: linear regression socst: linear regression ------------------------------------------------------------------ | Observations per m |---------------------------------------------- Variable | Complete Incomplete Imputed | Total -------------------+-----------------------------------+---------- female | 185 15 15 | 200 math | 190 10 10 | 200 read | 185 15 15 | 200 science | 193 7 7 | 200 socst | 188 12 12 | 200 ------------------------------------------------------------------ (complete + incomplete = total; imputed is the minimum across m of the number of filled-in observations.)
Next, we can run our survey logit model and check the interaction. Please note the order of the commands: The mi estimate: comes first, followed by the svy:, which in turn, is followed by the logit command itself.
mi estimate: svy: logit honors i.female##i.prog read math science socst Multiple-imputation estimates Imputations = 10 Survey: Logistic regression Number of obs = 190 Number of strata = 3 Population size = 9,998 Number of PSUs = 190 Average RVI = 0.0660 Largest FMI = 0.2469 Complete DF = 187 DF adjustment: Small sample DF: min = 75.62 avg = 156.92 max = 181.78 Model F test: Equal FMI F( 9, 182.6) = 5.06 Within VCE type: Linearized Prob > F = 0.0000 ---------------------------------------------------------------------------------- honors | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------+---------------------------------------------------------------- female | female | 1.669564 1.06815 1.56 0.120 -.438678 3.777806 | prog | academic | .706834 1.040896 0.68 0.498 -1.347074 2.760742 vocation | -.6572194 1.126282 -0.58 0.560 -2.879486 1.565048 | female#prog | female#academic | -.5020129 1.200932 -0.42 0.676 -2.87197 1.867944 female#vocation | 1.264679 1.36103 0.93 0.354 -1.421087 3.950444 | read | .0579493 .0365918 1.58 0.117 -.0149354 .1308341 math | .1131006 .0383768 2.95 0.004 .0372635 .1889377 science | .0709565 .0405595 1.75 0.082 -.0092108 .1511239 socst | -.0009834 .0323599 -0.03 0.976 -.0649752 .0630084 _cons | -15.40424 2.485827 -6.20 0.000 -20.31064 -10.49784 ---------------------------------------------------------------------------------- mi test 1.female#2.prog 1.female#3.prog note: assuming equal fractions of missing information ( 1) [honors]1.female#2.prog = 0 ( 2) [honors]1.female#3.prog = 0 F( 2, 183.4) = 1.26 Prob > F = 0.2850
Unfortunately our interaction was not statistically significant. However, we will push ahead and compute the predicted cell probabilities for the 2×3 interaction just to show how it can be done.
mi estimate, cmdok: mimargins 1 Multiple-imputation estimates Imputations = 10 Adjusted predictions Number of obs = 190 Number of strata = 3 Average RVI = 0.0279 Largest FMI = 0.0586 Complete DF = 187 DF adjustment: Small sample DF: min = 164.05 avg = 176.22 Within VCE type: Delta-method max = 183.42 ---------------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------+---------------------------------------------------------------- female#prog | male#general | .0716598 .0630814 1.14 0.257 -.0528264 .196146 male#academic | .1348606 .0586423 2.30 0.023 .0190696 .2506515 male#vocation | .0384081 .0288355 1.33 0.185 -.0184993 .0953155 female#general | .2891648 .0954564 3.03 0.003 .1007761 .4775536 female#academic | .3328262 .0879882 3.78 0.000 .1592084 .5064441 female#vocation | .427272 .1585705 2.69 0.008 .1144153 .7401288 ----------------------------------------------------------------------------------
And that is how you can compute adjusted predictions for multiply imputed survey data. This approach will generalize to other estimation commands as well as other margins commands.