The sem command introduced in Stata 12 makes the analysis of mediation models much easier as long as both the dependent variable and the mediator variable are continuous variables.
We will illustrate using the sem command with the hsbdemo dataset. The examples will not demonstrate full mediation, i.e., the effect of the independent variable will not go from being significant to being not significant. Rather, the examples will show partial mediation in which there is a decrease in the direct effect.
A note about covariates
If your model contains control variables, i.e., covariates, you must include these in each of the sem equations. Thus, your sem model will look something like this:
sem (MV <- IV CV1 CV2)(DV <- MV IV CV1 CV2)
where DV stands for the dependent variable, IV stands for the independent variable, MV stands for the mediator variable, and CVs stand for the covariates.
Simple mediation model
The simplest mediation model had one IV, one MV and a DV. Here is the symbolic version of the model.
sem (MV <- IV)(DV <- MV IV)
In our simple mediation example the independent variable is math, the mediator variable is read and the dependent variable is science.
use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear sem (read <- math)(science <- read math) Endogenous variables Observed: read science Exogenous variables Observed: math Fitting target model: Iteration 0: log likelihood = -2098.5822 Iteration 1: log likelihood = -2098.5822 Structural equation model Number of obs = 200 Estimation method = ml Log likelihood = -2098.5822 ------------------------------------------------------------------------------- | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- Structural | read | math | .724807 .0579824 12.50 0.000 .6111636 .8384504 _cons | 14.07254 3.100201 4.54 0.000 7.996255 20.14882 ------------+---------------------------------------------------------------- science | read | .3654205 .0658305 5.55 0.000 .2363951 .4944459 math | .4017207 .0720457 5.58 0.000 .2605138 .5429276 _cons | 11.6155 3.031268 3.83 0.000 5.674324 17.55668 --------------+---------------------------------------------------------------- var(e.read)| 58.71925 5.871925 48.26811 71.43329 var(e.science)| 50.8938 5.08938 41.83548 61.91346 ------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(0) = 0.00, Prob > chi2 = .
estat teffects Direct effects ------------------------------------------------------------------------------ | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | .724807 .0579824 12.50 0.000 .6111636 .8384504 -----------+---------------------------------------------------------------- science | read | .3654205 .0658305 5.55 0.000 .2363951 .4944459 math | .4017207 .0720457 5.58 0.000 .2605138 .5429276 ------------------------------------------------------------------------------ Indirect effects ------------------------------------------------------------------------------ | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | 0 (no path) -----------+---------------------------------------------------------------- science | read | 0 (no path) math | .2648593 .0522072 5.07 0.000 .1625351 .3671836 ------------------------------------------------------------------------------ Total effects ------------------------------------------------------------------------------ | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | .724807 .0579824 12.50 0.000 .6111636 .8384504 -----------+---------------------------------------------------------------- science | read | .3654205 .0658305 5.55 0.000 .2363951 .4944459 math | .66658 .05799 11.49 0.000 .5529217 .7802384 ------------------------------------------------------------------------------
The total effect for math, .66658, is the effect we would find if there was no mediator in our model. It is significant with a z of 11.49. The direct effect for math is .4017207 which, while still significant (z = 5.58), is much smaller than the total effect. The indirect effect of math that passes through read is .2648593 and is also statistically significant.
It is often easier to interpret these values by computing ratios and proportions as shown below.
proportion of total effect mediated = .2648593/.66658 = .3973406 ratio of indirect to direct effect = .2648593/.4017207 = .65931205 ratio of total to direct effect = .66658/.4017207 = 1.6593121
We see above that the proportion of the total effect that is mediated is almost .40 which is a respectable amount. The ratio of the indirect effect to the direct effect is about .66 or almost 2/3 the size of the direct effect. And finally, the total effect is about 1.66 times the direct effect.
Mediation with bootstrap standard errors and confidence intervals
If you are uncomfortable with the standard errors and confidence intervals produced directly by sem, you can obtain the bootstrapped standard errors and confidence intervals in two ways. First, by using the
vce(boostrap)
option after your sem command. Or second, by writing a small program that runs both the sem command and theestat teffects
and then bootstrapping this program.Let’s demonstrate the vce(boostrap) option. Here we will add the reps option and request 200 replications.
sem (read <- math)(science <- read math), vce(bootstrap,reps(200)) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 Structural equation model Number of obs = 200 Estimation method = ml Replications = 200 Log likelihood = -2098.5822 ------------------------------------------------------------------------------- | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- Structural | read | math | .724807 .0581262 12.47 0.000 .6108818 .8387321 _cons | 14.07254 3.092117 4.55 0.000 8.012099 20.13297 ------------+---------------------------------------------------------------- science | read | .3654205 .0802203 4.56 0.000 .2081915 .5226495 math | .4017207 .0875101 4.59 0.000 .2302041 .5732373 _cons | 11.6155 2.707368 4.29 0.000 6.309158 16.92184 --------------+---------------------------------------------------------------- var(e.read)| 58.71925 5.93704 48.16332 71.58871 var(e.science)| 50.8938 5.496477 41.18471 62.89176 -------------------------------------------------------------------------------
Adding this option provides us bootstrapped confidence intervals. You can now use estat teffects to obtain normal-based bootstrapped confidence intervals around the indirect effect.
estat teffects Direct effects ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | .724807 .0581262 12.47 0.000 .6108818 .8387321 -----------+---------------------------------------------------------------- science | read | .3654205 .0802203 4.56 0.000 .2081915 .5226495 math | .4017207 .0875101 4.59 0.000 .2302041 .5732373 ------------------------------------------------------------------------------ Indirect effects ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | 0 (no path) -----------+---------------------------------------------------------------- science | read | 0 (no path) math | .2648593 .0593311 4.46 0.000 .1485726 .3811461 ------------------------------------------------------------------------------ Total effects ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | read | math | .724807 .0581262 12.47 0.000 .6108818 .8387321 -----------+---------------------------------------------------------------- science | read | .3654205 .0802203 4.56 0.000 .2081915 .5226495 math | .66658 .0592669 11.25 0.000 .5504189 .7827411 ------------------------------------------------------------------------------
However, you can also write a program to perform the bootstrapping. This enables us to obtain both percentile-based and bias-corrected confidence intervals as well as normal-based confidence intervals. Here is the program that we a calling indireff.ado.
program indireff, rclass sem (read <- math)(science <- read math) estat teffects mat bi = r(indirect) mat bd = r(direct) mat bt = r(total) return scalar indir = el(bi,1,3) return scalar direct = el(bd,1,3) return scalar total = el(bt,1,3) end
So how do we know which elements of r(indirect), r(direct) and r(total) we need? We will use the sem command and then quietly run estat teffects followed by a matrix list to see the matrices of the coefficients.
sem (read <- math)(science <- read math) quietly estat teffects matrix list r(indirect) r(indirect)[1,3] read: science: science: o. o. math read math r1 0 0 .26485934 matrix list r(direct) r(direct)[1,3] read: science: science: math read math r1 .72480697 .36542052 .40172068 matrix list r(total) r(total)[1,3] read: science: science: math read math r1 .72480697 .36542052 .66658002
We see that in each case the coefficient of interest is the third element.
Now that we know the correct matrix elements, we will run indireff for 200 bootstrap replications. You may want to run more, say 2,000 to 5,000. We will then request the percentile and biased corrected confidence intervals.
set seed 358395 bootstrap r(indir) r(direct) r(total), reps(200): indireff Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 Bootstrap results Number of obs = 200 Replications = 200 command: indireff _bs_1: r(indir) _bs_2: r(direct) _bs_3: r(total) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | .2648593 .0545941 4.85 0.000 .1578569 .3718618 _bs_2 | .4017207 .0872965 4.60 0.000 .2306228 .5728186 _bs_3 | .66658 .0576837 11.56 0.000 .553522 .7796381 ------------------------------------------------------------------------------
Mediation with multiple IVs
What if you had multiple independent variables? You just need to have one equation for each IV predicting the mediator variable. Here is the symbolic model.
sem (MV <- IV1)(MV <- IV2)(DV <- MV IV1 IV2)
For our example, we will use math and ses as our independent variables. We will keep the same mediator and dependent variable as before.
sem (read <- math)(read <- ses)(science <- read math ses) Endogenous variables Observed: read science Exogenous variables Observed: math ses Fitting target model: Iteration 0: log likelihood = -2306.1661 Iteration 1: log likelihood = -2306.1661 Structural equation model Number of obs = 200 Estimation method = ml Log likelihood = -2306.1661 ------------------------------------------------------------------------------- | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- Structural | read | math | .68845 .059519 11.57 0.000 .5717949 .805105 ses | 1.726 .7698566 2.24 0.025 .2171093 3.234892 _cons | 12.43962 3.147394 3.95 0.000 6.270842 18.6084 ------------+---------------------------------------------------------------- science | read | .3507374 .0663219 5.29 0.000 .2207487 .480726 math | .3905883 .0721193 5.42 0.000 .2492371 .5319395 ses | 1.033732 .731092 1.41 0.157 -.3991816 2.466647 _cons | 10.84415 3.065166 3.54 0.000 4.836532 16.85176 --------------+---------------------------------------------------------------- var(e.read)| 57.27968 5.727968 47.08476 69.68202 var(e.science)| 50.39009 5.039009 41.42142 61.30067 ------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(0) = 0.00, Prob > chi2 = .
We note that the indirect effects of both math and ses are significant.
Because we have multiple independent variables, the computation of the ratios and proportions is a bit more complex.
proportion of total math effect mediated = .2414651/.6320534 = .38203275 proportion of total ses effect mediated = .6053729/1.639105 = .36933137 ratio of math indirect to direct effect = .2414651/.3905883 = .61820874 ratio of ses indirect to direct effect = .6053729/1.033732 = .58561881 ratio of total math to direct effect = .6320534/.3905883 = 1.6182087 ratio of total ses to direct effect = 1.639105/1.033732 = 1.5856189
Mediation with multiple mediators
In this section we will consider the case in which there are multiple mediator variables. This time there will be one equation for each mediator variable. The symbolic form of the mode looks like this.
sem (MV1 <- IV)(MV2 <- IV)(DV <- MV1 MV2 IV)
For our example we will use read and write as the mediators. We will go back to a single independent variable, math.
sem (read <- math)(write <- math)(science <- read write math) Endogenous variables Observed: read write science Exogenous variables Observed: math Fitting target model: Iteration 0: log likelihood = -2779.4174 Iteration 1: log likelihood = -2779.4174 Structural equation model Number of obs = 200 Estimation method = ml Log likelihood = -2779.4174 ------------------------------------------------------------------------------- | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- Structural | read | math | .724807 .0579824 12.50 0.000 .6111636 .8384504 _cons | 14.07254 3.100201 4.54 0.000 7.996255 20.14882 ------------+---------------------------------------------------------------- write | math | .6247082 .0562757 11.10 0.000 .5144099 .7350065 _cons | 19.88724 3.008947 6.61 0.000 13.98981 25.78467 ------------+---------------------------------------------------------------- science | read | .3015317 .0679912 4.43 0.000 .1682715 .434792 write | .2065257 .0700532 2.95 0.003 .0692239 .3438274 math | .3190094 .0759047 4.20 0.000 .170239 .4677798 _cons | 8.407353 3.160709 2.66 0.008 2.212476 14.60223 --------------+---------------------------------------------------------------- var(e.read)| 58.71925 5.871925 48.26811 71.43329 var(e.write)| 55.31334 5.531334 45.46841 67.28993 var(e.science)| 48.77421 4.877421 40.09314 59.33492 ------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(1) = 21.43, Prob > chi2 = 0.0000
The indirect effect for math, .345706, is the combination of the indirect via read plus the indirect via write. We can compute these indirect paths manually.
indirect via read = .724807*.3015317 = .21855229 indirect via write = .6247082*.2065257 = .1290183 total indirect = .724807*.3015317 + .6247082*.2065257 = .21855229 + .1290183 = .34757059
The last computation shows that the indirect effect given by estat teffects is the combined indirect effect.
We can use the values we just computed to get the ratios and proportions of interest.
proportion of total math effect mediated = .3475706/.66658 = .52142369 proportion of total math effect mediated via read = .21855229/.66658 = .32787106 proportion of total math effect mediated via write = .1290183/.66658 = .19355261 ratio of math indirect to direct effect = .3475706/.3190094 = 1.0895309 ratio of math indirect to direct effect via read = .21855229/.3190094 = .68509671 ratio of math indirect to direct effect via write = .1290183/.3190094 = .40443416 ratio of total math to direct effect = .66658/.3190094 = 2.0895309