How can I get Monte Carlo standard errors for indirect effects?

This page shows how to obtain Monte Carlo standard errors and confidence intervals for indirect effects in a mediation analysis. We will illustrate the process using the hsbdemo dataset. For our example, science will act as the dependent variable, math as the independent variable and read as the mediator variable. We make no claims that this a good example of a mediation model. We only use it to illustrate the steps involved. We will begin by reading in the data and using the sureg command to generate the values we will use to compute the indirect effect as the product of coefficients.

use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear

(highschool and beyond (200 cases))

sureg (read math)(science read math)

Seemingly unrelated regression
----------------------------------------------------------------------
Equation          Obs  Parms        RMSE    "R-sq"       chi2        P
----------------------------------------------------------------------
read              200      1    7.662848    0.4386     156.26   0.0000
science           200      2    7.133989    0.4782     183.30   0.0000
----------------------------------------------------------------------

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read         |
        math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
       _cons |   14.07254   3.100201     4.54   0.000     7.996255    20.14882
-------------+----------------------------------------------------------------
science      |
        read |   .3654205   .0658305     5.55   0.000     .2363951    .4944459
        math |   .4017207   .0720457     5.58   0.000     .2605138    .5429276
       _cons |    11.6155   3.031268     3.83   0.000     5.674324    17.55668
------------------------------------------------------------------------------

One method of computing the indirect effect, standard error and confidence interval is to use the nlcom command.

nlcom _b[read:math]*_b[science:read]

       _nl_1:  _b[read:math]*_b[science:read]

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |   .2648593   .0522072     5.07   0.000     .1625351    .3671836
------------------------------------------------------------------------------

In the nlcom output, the Coef is the estimate of the indirect effect. nlcom uses the delta method to estimate the standard error. The delta method is based on some pretty strong normal theory assumptions. Additionally, the confidence is also computed with normality assumptions. Many researchers feel that these normal theory assumption are unlikely to hold for an estimate that is the product of two normally distributed coefficients. They might prefer a bootstrap approach such as shown on the How can I perform Sobel-Goodman mediation tests? page. Or, they might prefer a Monte Carlo approach, which we will cover on this page.

Basically, we will generate random normal values of our two coefficient with the parameter estimates as the mean and the standard error as the standard deviation. If the correlation of the two coefficients is zero (or close to zero), we can generate these value as two independent random normal distributions. If the correlation between the coefficient estimates is not zero we will have to generate the values as a random bivariate normal distribution.

So, let’s check the correlation between the coefficients. The information we need can be found in the ereturn list matrix V which we will convert to a correlation matrix.

mat V = e(V)
mat C = corr(V)
mat lis C


symmetric C[5,5]
                     read:       read:    science:    science:    science:
                     math       _cons        read        math       _cons
    read:math           1
   read:_cons  -.98460796           1
 science:read  -2.143e-31   2.177e-31           1
 science:math   9.632e-16  -9.483e-16  -.66228013           1
science:_cons  -1.205e-15   1.224e-15   -.3056154  -.50002438           1

The correlation coefficient for read predicted by math and the coefficient for science predicted by read is -4.975e-30. This value is close enough to zero that I will choose to ignore it.

Now we can go ahead and generate our random values. We begin by resetting the sample size to 50,000. Obviously if your sample size is already 50,000 or greater, you don’t need to reset it. By the way, there is nothing magical about the 50,000 number. It just seemed like a nice large number. Next, we set the random number seed so that we can replicate the results. The next commands generate two random normal variates using the rnormal function with the original coefficient values as the mean and the standard error as the standard error. These commands are followed by generating the product of the random coefficients. This product is the random indirect effect, i.e., the mediation effect.

set obs 50000
set seed 987654
generate a1 = rnormal(_b[read:math],_se[read:math])
generate b1 = rnormal(_b[science:read],_se[science:read])
generate prod1 = a1*b1

Let’s look at prod1 to see how things went.

kdensity prod1



summarize prod1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       prod1 |     50,000    .2652127    .0520892   .0681946   .5289696

_pctile prod1, p(2.5 97.5)
return list

scalars:
                 r(r1) =  .1667018160223961
                 r(r2) =  .3709507137537003

The kernel density plot shows a normal distribution of the random products, the indirect effect. The mean random indirect effect is .2652127 with a standard deviation of .0520892. The standard deviation of a sampling distribution is a standard error. So, .0520892 is the Monte Carlo standard error for our indirect effect. Finally, the Monte Carlo 95% confidence interval is (.167 .371).

But what if the coefficients in the two sureg equations are correlated?

When coefficients are correlated you will only need to change a few lines of code for the Monte Carlo analysis. The primary change is that the two rnormal function calls are replaced with the drawnorm command. To use the drawnorm command we will create a vector of means (m), a vector of standard deviations (s), and a correlation matrix (c). Alternatively, you can use a covariance matrix instead of the standard deviation vector and correlation matrix. Starting from the set seed command here is what the code looks like.

set seed 6574839
mat  m = (_b[read:math], _b[science:read])
mat sd = (_se[read:math], _se[science:read])
mat  c = (1, .1 \ .1, 1) 
drawnorm a2 b2, means(m) sds(sd) corr(c) 
generate prod2 = a2*b2

kdensity prod2



summarize prod2


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       prod2 |     50,000    .2649602    .0541438   .0734409   .5078928

_pctile prod2, p(2.5 97.5)
return list

scalars:

                 r(r1) =  .1641761288046837
                 r(r2) =  .37570521235466

Once again the kernel density plot appears to be normal. This time the mean random indirect effect is .2649602 with a standard deviation of .0541438. The standard deviation of a sampling distribution is a standard error. So, .0541438 is the Monte Carlo standard error for our indirect effect. Finally, the Monte Carlo 95% confidence interval is (.164 .376).

Here is a comparison of the standard errors using the three methods from this page.

Delta method using nlcom                 .0522072
Monte Carlo, coefficients uncorrelated   .0520892 
Monte Carlo, coefficients correlated .1  .0541438

As you can see, all of these standard errors fairly close to one another.