How can I perform mediation with multilevel data? (Method 2)

Attention

See this FAQ by Bauer that discusses the need to decompose within- and between-group effects when using this approach to ensure valid results (https://dbauer.web.unc.edu/wp-content/uploads/sites/7494/2015/08/Centering-in-111-Mediation.pdf).

FAQ starts here

Version info: Code for this page was tested in SAS 9.3.

Mediator variables are variables that sit between the independent variable and dependent variable and mediate the effect of the IV on the DV. A model with one mediator is shown in the figure below.

The idea, in mediation analysis, is that some of the effect of the predictor variable, the IV, is transmitted to the DV through the mediator variable, the MV. And some of the effect of the IV passes directly to the DV. That portion of of the effect of the IV that passes through the MV is the indirect effect.

An earlier approach to multilevel mediation suggested by Krull & MacKinnon (2001) was method 1. This page will demonstrate an alternative approach given in the 2006 paper by Bauer, Preacher & Gil. This approach combines the dependent variable and the mediator into a single stacked response variable and runs one mixed model with indicator variables for the DV and mediator to obtain all of the values needed for the analysis.

We will begin by loading in a synthetic data set and reconfiguring it for our analysis. All of the variables in this example (id the cluster ID, x the predictor variable, m the mediator variable, and y the dependent variable) are at level 1 Here is how the first 16 observations look in the original dataset. Let’s start by reading in the data and looking at a few descriptive statistics.


filename tmp url 'http://stats.idre.ucla.edu/stat/data/ml_sim.csv';
data ml_sim;
infile tmp dlm=',' firstobs=2;
input id x m y;
fid = _n_;
run;

proc means;
  var id fid x m y;
run;  

 The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id          800      50.5000000      28.8841283       1.0000000     100.0000000
fid         800     400.5000000     231.0844002       1.0000000     800.0000000
x           800      -0.1539876       1.3303736      -4.1514263       3.8696609
m           800      -0.0247739       1.4836143      -6.4783697       5.0124219
y           800      -0.1833981       1.6691804      -8.6000316       5.9190076
-------------------------------------------------------------------------------

There are 100 level-2 units each with eight observations. fid is a row id, so when the data is not stacked, there is just 1 obsevation for each fid. Let’s look at the three models of a mediation analysis beginning with the model with just the IV.


proc mixed noclprint;
  class id;
  model y = x / solution;
  random intercept x / subject=id;
run;

 Covariance Parameter Estimates

Cov Parm      Subject    Estimate

Intercept     id           0.7505
x             id           0.2192
Residual                   0.8217

           Fit Statistics
-2 Res Log Likelihood          2435.3
AIC (smaller is better)        2441.3
AICC (smaller is better)       2441.3
BIC (smaller is better)        2449.1


                   Solution for Fixed Effects
                         Standard
Effect       Estimate       Error      DF    t Value    Pr > |t|
Intercept    -0.02737     0.09620      99      -0.28      0.7766
x              0.6907     0.05883      99      11.74

Next, comes the model with the mediator predicted by the IV.


proc mixed noclprint;
  class id;
  model m = x / solution;
  random intercept x / subject=id;
run;

 Covariance Parameter Estimates
Cov Parm      Subject    Estimate
Intercept     id           0.7087
x             id           0.1165
Residual                   0.6451

           Fit Statistics
-2 Res Log Likelihood          2234.6
AIC (smaller is better)        2240.6
AICC (smaller is better)       2240.6
BIC (smaller is better)        2248.4

                   Solution for Fixed Effects
                         Standard
Effect       Estimate       Error      DF    t Value    Pr > |t|
Intercept     0.09519     0.09158      99       1.04      0.3012
x              0.6114     0.04641      99      13.17

Finally, the model with both the IV and mediator predicting the DV.


proc mixed noclprint;
  class id;
  model y = m x / solution;
  random intercept m x / subject=id;
run;

 Covariance Parameter Estimates
Cov Parm      Subject    Estimate
Intercept     id           0.2653
m             id           0.1230
x             id          0.03747
Residual                   0.5070

           Fit Statistics
-2 Res Log Likelihood          2046.8
AIC (smaller is better)        2054.8
AICC (smaller is better)       2054.9
BIC (smaller is better)        2065.3

                   Solution for Fixed Effects
                         Standard
Effect       Estimate       Error      DF    t Value    Pr > |t|
Intercept    -0.09364     0.06251      99      -1.50      0.1373
m              0.6219     0.04721      99      13.17

We see that the IV although still significant has been reduced from .69 to .25. Now, we need to restructure the data to stack y on m for each row and create indicator variables for both the mediator and the dependent variables. Here’s how we can do this.


data ml_simlong;
  set ml_sim;
  z = y;
  sy = 1;
  sm = 0;
  dv = 'y';
  output;
  z = m;
  sy = 0;
  sm = 1;
  dv = 'm';
  output;
run;

The new response variable is called z and has y stacked on m. We named the indicators for the mediator and the DV sm and sy respectively, to be consistent with Bauer et al (2006). We have also created a new m that contains the value for the mediator from each of the original observations.

Now we can run our mixed model for multilevel mediation using proc mixed. Notice that because we include the sm and sy indicators in the model that we need to use the noint option for the fixed effects (it is not automatically included for random effects, so there is no need to supress it). In addition to the random effects, we use a repeated subcommand to model the heterogeneity in residual variances for y and m (which are now stacked and just in the variable z.


proc mixed data=ml_simlong noclprint covtest;
  class id dv;
  model z = sm sm * x sy sy * m sy * x /noint solution covb;
  random sm sm * x sy sy * m sy * x / subject=id type=un;
  repeated / group=dv subject=id;
run;

The Mixed Procedure

                  Model Information

Data Set                     WORK.ML_SIMLONG
Dependent Variable           z
Covariance Structures        Unstructured, Variance
                             Components
Subject Effects              id, id
Group Effect                 dv
Estimation Method            REML
Residual Variance Method     None
Fixed Effects SE Method      Model-Based
Degrees of Freedom Method    Containment


            Dimensions

Covariance Parameters            17
Columns in X                      5
Columns in Z Per Subject          5
Subjects                        100
Max Obs Per Subject              16


          Number of Observations

Number of Observations Read            1600
Number of Observations Used            1600
Number of Observations Not Used           0


                     Iteration History

Iteration    Evaluations    -2 Res Log Like       Criterion

        0              1      5018.02760634
        1              4      4324.76342851     50.66699378
        2              3      4303.83578771     29.81384239
        3              2      4284.51183182     26.06489084
        4              1      4262.57000555      0.02711155
        5              2      4253.92727615      0.00276758
        6              2      4252.35296469      0.00012231
        7              1      4252.26931686      0.00000042
        8              1      4252.26903780      0.00000000


                   Convergence criteria met.
                      Covariance Parameter Estimates

                                             Standard         Z
Cov Parm     Subject    Group    Estimate       Error     Value        Pr Z

UN(1,1)      id                    0.6794      0.1132      6.00      0.09896     0.02282      4.34       ChiSq

    16        765.76           |t|

sm         0.09321     0.08943      99       1.04      0.2998
sm*x        0.6119     0.04650      99      13.16      0.6106     0.04554      99      13.41      0.2208     0.03725      99       5.93      0.002162    0.000127    0.000985    -0.00020
   3    sy        0.000576    0.000127    0.003839    -0.00011    -0.00006
   4    sy*m      0.000093    0.000985    -0.00011    0.002074    -0.00048
   5    x*sy      -0.00006    -0.00020    -0.00006    -0.00048    0.001387


       Type 3 Tests of Fixed Effects

           Num     Den
Effect      DF      DF    F Value    Pr > F

sm           1      99       1.09    0.2998
sm*x         1      99     173.17

We now have access to all of the information needed to compute the average indirect effect and average total effect and their standard errors using the equations given in Bauer, et. al. (2006).

[ ind = ab + sigma_{a_{j}b_{j}} quad (EQ:A11) ] [ Var(ind) = b^{2}sigma^{2}_{hat{a}} + a^{2}sigma^{2}_{hat{b}} + sigma^{2}_{hat{a}}sigma^{2}_{hat{b}} + 2absigma_{hat{a},hat{b}} + (sigma_{hat{a},hat{b}})^2 + sigma^{2}_{hat{sigma}_{a_{j},b_{j}}} quad (EQ:A14) ]

average total effect

[ tot = ab + sigma_{a_{j}b_{j}} + c’ quad (EQ:A15) ] [ Var(ind) = b^{2}sigma^{2}_{hat{a}} + a^{2}sigma^{2}_{hat{b}} + 2absigma_{hat{a},hat{b}} + 2bsigma_{hat{a},hat{c}’} + 2asigma_{hat{b},hat{c}’} + sigma^{2}_{hat{sigma}_{a_{j},b_{j}}} + sigma^{2}_{hat{c}’} + sigma^{2}_{hat{a}}sigma^{2}_{hat{b}} + (sigma_{hat{a},hat{b}})^2 quad (EQ:A18) ]

These formulae involve the fixed effects estimates, their variances and covariances, and variances and covariances from the random effects. The values used are highlighted in yellow in the SAS output above.

[ a = 0.6119 b = 0.6106 c’ = 0.2208 sigma_{a_{j}b_{j}} = 0.09896 sigma^{2}_{hat{a}} = 0.002162 sigma^{2}_{hat{b}} = 0.002074 sigma_{hat{a},hat{b}} = 0.000985 sigma_{hat{a},hat{c}’} = -0.00020 sigma_{hat{b},hat{c}’} = -0.00048 sigma^{2}_{hat{c}’} = 0.001387 sigma^{2}_{hat{sigma}_{a_{j},b_{j}}} = 0.02282^{2} ]

To calculate this, you just need a calculator. A simple way in SAS is using SAS’ matrix language, in PROC IML, which essentially allows us to just declare the constants and write out the formulae.


proc iml;
  a = 0.6119;
  b = 0.6106;
  rcov_ab = 0.09896;
  cprime = 0.2208;
  Va = 0.002162;
  Vb = 0.002074;
  Vcprime = 0.001387;
  cov_ab = 0.000985;
  cov_ac = -0.00020;
  cov_bc = -0.00048;
  Vcov_ab = 0.022822**2;

  ind_eff = a*b + rcov_ab;
  V_ind_eff = a**2*Vb + b**2*Va + Va*Vb + 2*a*b*cov_ab + cov_ab**2 + Vcov_ab;
  test_ind = ind_eff/V_ind_eff**.5;
  tot_eff = ind_eff + cprime;
  V_tot_eff = b**2*Va + a**2*Vb + Va*Vb + 2*a*b*cov_ab + cov_ab**2 + Vcprime + 2*b*cov_ac + 2*a*cov_bc + Vcov_ab;
  test_tot = tot_eff/V_tot_eff**.5;

  print ind_eff; /* indirect effect */
  print V_ind_eff; /* variance of indirect effect */
  print test_ind; /* significance test of indirect effect, test against standard normal */
  print tot_eff; /* total effect */
  print V_tot_eff; /* variance of total effect */
  print test_tot; /* significance test of total effect, test against standard normal */
quit;

 ind_eff
0.4725861

V_ind_eff
 0.002845

 test_ind
8.8601944

 tot_eff
0.6933861

V_tot_eff
0.0034003

 test_tot
11.890965

We get the indirect effect, the variance of the indirect effect, and a test value (the effect divided by its standard error), which we could calculate a p-value for by comparing it against the standard normal distribution. For anything greater than roughly 1.96 will be statistically significant at p = .05.

References

Bauer, D. J., Preacher, K. J. & Gil, K. M. (2006) Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: New procedures and recommendations. Psychological Methods, 11(2), 142-163.
Krull, J. L. & MacKinnon, D. P. (2001) Multilevel modeling of individual and group level mediated effects. Multivariate Behavioral Research, 36(2), 249-277.

How can I perform mediation with multilevel data? (Method 2) | SAS FAQ

Attention

FAQ starts here

See also

References