This page shows how to obtain Monte Carlo standard errors and confidence intervals for
indirect effects in a mediation analysis. We will illustrate the process using the
**hsbdemo** dataset. For our example, **science** will act as
the dependent variable, **math** as the independent variable and
**read** as the mediator variable. We make no claims that this a good
example of a mediation model. We only use it to illustrate the steps involved. We will
begin by reading in the data and using the **sureg** command to generate
the values we will use to compute the indirect effect as the product of coefficients.

use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear(highschool and beyond (200 cases))sureg (read math)(science read math)Seemingly unrelated regression ---------------------------------------------------------------------- Equation Obs Parms RMSE "R-sq" chi2 P ---------------------------------------------------------------------- read 200 1 7.662848 0.4386 156.26 0.0000 science 200 2 7.133989 0.4782 183.30 0.0000 ---------------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | math | .724807 .0579824 12.50 0.000 .6111636 .8384504 _cons | 14.07254 3.100201 4.54 0.000 7.996255 20.14882 -------------+---------------------------------------------------------------- science | read | .3654205 .0658305 5.55 0.000 .2363951 .4944459 math | .4017207 .0720457 5.58 0.000 .2605138 .5429276 _cons | 11.6155 3.031268 3.83 0.000 5.674324 17.55668 ------------------------------------------------------------------------------

One method of computing the indirect effect, standard error and confidence interval
is to use the **nlcom** command.

nlcom _b[read:math]*_b[science:read]_nl_1: _b[read:math]*_b[science:read] ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | .2648593 .0522072 5.07 0.000 .1625351 .3671836 ------------------------------------------------------------------------------

In the **nlcom** output, the Coef is the estimate of the indirect effect.
**nlcom**
uses the delta method to estimate the standard error. The delta method is based on some
pretty strong normal theory assumptions. Additionally, the confidence is also computed
with normality assumptions. Many researchers feel that these normal theory assumption
are unlikely to hold for an estimate that is the product of two normally distributed
coefficients. They might prefer a bootstrap approach such as shown on the
How can I perform
Sobel-Goodman mediation tests? page. Or, they might prefer a Monte Carlo approach,
which we will cover on this page.

Basically, we will generate random normal values of our two coefficient with the parameter estimates as the mean and the standard error as the standard deviation. If the correlation of the two coefficients is zero (or close to zero), we can generate these value as two independent random normal distributions. If the correlation between the coefficient estimates is not zero we will have to generate the values as a random bivariate normal distribution.

So, let’s check the correlation between the coefficients. The information we need can be
found in the **ereturn** list matrix **V** which we will convert to a correlation
matrix.

mat V = e(V) mat C = corr(V) mat lis Csymmetric C[5,5] read: read: science: science: science: math _cons read math _cons read:math 1 read:_cons -.98460796 1 science:read -2.143e-31 2.177e-31 1 science:math 9.632e-16 -9.483e-16 -.66228013 1 science:_cons -1.205e-15 1.224e-15 -.3056154 -.50002438 1

The correlation coefficient for **read** predicted by **math**
and the coefficient for **science** predicted by **read**
is -4.975e-30. This value is close enough to zero that I will choose to ignore it.

Now we can go ahead and generate our random values. We begin by resetting the sample size
to 50,000. Obviously if your sample size is already 50,000 or greater, you don’t need to
reset it. By the way, there is nothing magical about the 50,000 number. It just seemed like
a nice large number. Next, we set the random number seed so that we can replicate the results.
The next commands generate two random normal variates using the **rnormal**
function with the original coefficient values as the mean and the standard error as the
standard error. These commands are followed by generating the product of the random
coefficients. This product is the random indirect effect, i.e., the mediation effect.

set obs 50000 set seed 987654 generate a1 = rnormal(_b[read:math],_se[read:math]) generate b1 = rnormal(_b[science:read],_se[science:read]) generate prod1 = a1*b1

Let’s look at **prod1** to see how things went.

kdensity prod1summarize prod1Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- prod1 | 50,000 .2652127 .0520892 .0681946 .5289696_pctile prod1, p(2.5 97.5) return listscalars: r(r1) = .1667018160223961 r(r2) = .3709507137537003

The kernel density plot shows a normal distribution of the random products, the indirect effect. The mean random indirect effect is .2652127 with a standard deviation of .0520892. The standard deviation of a sampling distribution is a standard error. So, .0520892 is the Monte Carlo standard error for our indirect effect. Finally, the Monte Carlo 95% confidence interval is (.167 .371).

## But what if the coefficients in the two sureg equations are correlated?

When coefficients are correlated you will only need to change a few lines of code for
the Monte Carlo analysis. The
primary change is that the two **rnormal** function calls are replaced
with the **drawnorm** command. To use the **drawnorm**
command we will create a vector of means (**m**), a vector of standard
deviations (**s**), and a correlation matrix (**c**).
Alternatively, you can use a covariance matrix instead of the standard deviation
vector and correlation matrix. Starting from the **set seed**
command
here is what the code looks like.

set seed 6574839 mat m = (_b[read:math], _b[science:read]) mat sd = (_se[read:math], _se[science:read]) mat c = (1, .1 \ .1, 1) drawnorm a2 b2, means(m) sds(sd) corr(c) generate prod2 = a2*b2

kdensity prod2summarize prod2Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- prod2 | 50,000 .2649602 .0541438 .0734409 .5078928_pctile prod2, p(2.5 97.5) return listscalars: r(r1) = .1641761288046837 r(r2) = .37570521235466

Once again the kernel density plot appears to be normal. This time the mean random indirect effect is .2649602 with a standard deviation of .0541438. The standard deviation of a sampling distribution is a standard error. So, .0541438 is the Monte Carlo standard error for our indirect effect. Finally, the Monte Carlo 95% confidence interval is (.164 .376).

Here is a comparison of the standard errors using the three methods from this page.

Delta method usingnlcom.0522072 Monte Carlo, coefficients uncorrelated .0520892 Monte Carlo, coefficients correlated .1 .0541438

As you can see, all of these standard errors fairly close to one another.