How can I get margins for a multiply imputed survey logit model?

The margins command introduced in Stata 11 is a very popular post-estimation command. However, it can be tricky to use in conjunction with multiple imputation and survey data.

Let’s begin by looking at the data.

use https://stats.idre.ucla.edu/stat/data/hsbmar, clear

sum honors female prog read math science socst, sep(0)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      honors |       200        .265    .4424407          0          1
      female |       185    .5459459    .4992356          0          1
        prog |       200       2.025    .6904772          1          3
        read |       185    51.61622    10.19104         28         76
        math |       190    52.17895    9.246168         33         75
     science |       193    51.57513     9.86396         26         74
       socst |       188    51.59043    10.44862         26         71

As you can see from the table above, all of the variables except for honors and prog have missing values.

honors is the binary response variable while female (two level categorical) and prog (three level categorical) are the research variables of interest with read, math, science and socst serving as control variables. Our primary interest is in the female-by-prog interaction. We will want to compute the predicted probabilities for each of the six cells of the 2-by-3 interaction.

So, what’s the big deal?

Why not just impute the data and then run the margins command. Well, we can impute the data, but we need a way to run both svy logit and margins on each imputed dataset and then combine the margins results into a single output. The issue is that margins does not work with mi estimate.

We can accomplish this by writing a wrapper program called mimargins and saving it in a file called mimargins.ado. It contains both the svy logit and margins commands. By setting the option properties to mi, mimargins can be used with mi estimate. We also need to declare mimargins to be an eclass program.

Here is what the mimargins program looks like.


program mimargins, eclass properties(mi)
	version 12
	svy: logit honors i.female##i.prog read math science socst
	margins female#prog, atmeans asbalanced post  
end

Here is how you use mimargins in the calling program.

mi estimate, cmdok: mimargins 1

The cmdok is needed because Stata does not recognize mimargins as an mi estimable program.

Next, we need to note that our data are not truly survey data. We are going to fake this by declaring that the values of write are the pweights and that ses is the stratification variable. Since this is part of a multiple imputation we need to run the survey set command as mi svyset. Here is the code for performing the multiple imputation using chained equations creating 10 imputed datasets. Note, the value 10 for the number of imputed datasets was selected for demonstration purposes and does not represent a recommendation.

set seed 1234543

mi set mlong

mi register imputed female math read science socst 

mi svyset [pw=write], strata(ses)

mi impute chain (logit) female (regress) math read science socst = ///

              write awards, add(10)

Conditional models:
           science: regress science math socst i.female read write awards
              math: regress math science socst i.female read write awards
             socst: regress socst science math i.female read write awards
            female: logit female science math socst read write awards
              read: regress read science math socst i.female write awards

Performing chained iterations ...

Multivariate imputation                     Imputations =       10
Chained equations                                 added =       10
Imputed: m=1 through m=10                       updated =        0

Initialization: monotone                     Iterations =      100
                                                burn-in =       10

            female: logistic regression
              math: linear regression
              read: linear regression
           science: linear regression
             socst: linear regression

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
            female |        185           15        15 |       200
              math |        190           10        10 |       200
              read |        185           15        15 |       200
           science |        193            7         7 |       200
             socst |        188           12        12 |       200
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

Next, we can run our survey logit model and check the interaction. Please note the order of the commands: The mi estimate: comes first, followed by the svy:, which in turn, is followed by the logit command itself.

mi estimate: svy: logit honors i.female##i.prog read math science socst



Multiple-imputation estimates                   Imputations       =         10
Survey: Logistic regression                     Number of obs     =        190

Number of strata  =         3                   Population size   =      9,998
Number of PSUs    =       190
                                                Average RVI       =     0.0660
                                                Largest FMI       =     0.2469
                                                Complete DF       =        187
DF adjustment:   Small sample                   DF:     min       =      75.62
                                                        avg       =     156.92
                                                        max       =     181.78
Model F test:       Equal FMI                   F(   9,  182.6)   =       5.06
Within VCE type:   Linearized                   Prob > F          =     0.0000

----------------------------------------------------------------------------------
          honors |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
          female |
         female  |   1.669564    1.06815     1.56   0.120     -.438678    3.777806
                 |
            prog |
       academic  |    .706834   1.040896     0.68   0.498    -1.347074    2.760742
       vocation  |  -.6572194   1.126282    -0.58   0.560    -2.879486    1.565048
                 |
     female#prog |
female#academic  |  -.5020129   1.200932    -0.42   0.676     -2.87197    1.867944
female#vocation  |   1.264679    1.36103     0.93   0.354    -1.421087    3.950444
                 |
            read |   .0579493   .0365918     1.58   0.117    -.0149354    .1308341
            math |   .1131006   .0383768     2.95   0.004     .0372635    .1889377
         science |   .0709565   .0405595     1.75   0.082    -.0092108    .1511239
           socst |  -.0009834   .0323599    -0.03   0.976    -.0649752    .0630084
           _cons |  -15.40424   2.485827    -6.20   0.000    -20.31064   -10.49784
----------------------------------------------------------------------------------

mi test 1.female#2.prog 1.female#3.prog
note: assuming equal fractions of missing information

 ( 1)  [honors]1.female#2.prog = 0
 ( 2)  [honors]1.female#3.prog = 0

       F(  2, 183.4) =    1.26
            Prob > F =    0.2850

Unfortunately our interaction was not statistically significant. However, we will push ahead and compute the predicted cell probabilities for the 2×3 interaction just to show how it can be done.

mi estimate, cmdok: mimargins 1 

Multiple-imputation estimates                   Imputations       =         10
Adjusted predictions                            Number of obs     =        190

Number of strata  =         3
                                                Average RVI       =     0.0279
                                                Largest FMI       =     0.0586
                                                Complete DF       =        187
DF adjustment:   Small sample                   DF:     min       =     164.05
                                                        avg       =     176.22
Within VCE type: Delta-method                           max       =     183.42

----------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
     female#prog |
   male#general  |   .0716598   .0630814     1.14   0.257    -.0528264     .196146
  male#academic  |   .1348606   .0586423     2.30   0.023     .0190696    .2506515
  male#vocation  |   .0384081   .0288355     1.33   0.185    -.0184993    .0953155
 female#general  |   .2891648   .0954564     3.03   0.003     .1007761    .4775536
female#academic  |   .3328262   .0879882     3.78   0.000     .1592084    .5064441
female#vocation  |    .427272   .1585705     2.69   0.008     .1144153    .7401288
----------------------------------------------------------------------------------

And that is how you can compute adjusted predictions for multiply imputed survey data. This approach will generalize to other estimation commands as well as other margins commands.