Growth models are a very popular type of analysis. Many growth models can be run either with mixed or sem and yield the same results. This page will provide several examples of this.
We will begin by reading in the depression_clean dataset and changing it from wide into long form so that we can run mixed.
use https://stats.idre.ucla.edu/stat/data/depression_clean, clear
reshape long dep, i(sid) j(time)
(note: j = 0 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 46 -> 138
Number of variables 6 -> 5
j variable (3 values) -> time
xij variables:
dep0 dep1 dep2 -> dep
-----------------------------------------------------------------------------
Unconditional growth model
We begin by running the unconditional growth model using mixed with both random intercepts and random slope for time.
mixed dep time || sid:time, var cov(unstr)
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -414.27639
Iteration 1: log likelihood = -414.25833
Iteration 2: log likelihood = -414.25832
Computing standard errors:
Mixed-effects ML regression Number of obs = 138
Group variable: sid Number of groups = 46
Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(1) = 14.13
Log likelihood = -414.25832 Prob > chi2 = 0.0002
------------------------------------------------------------------------------
dep | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | -1.6025 .4262612 -3.76 0.000 -2.437957 -.7670434
_cons | 14.18924 .8147121 17.42 0.000 12.59243 15.78605
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
sid: Unstructured |
var(time) | 3.201386 2.047798 .9138158 11.21547
var(_cons) | 21.93819 6.613945 12.1501 39.61154
cov(time,_cons) | -1.153612 2.751286 -6.546034 4.23881
-----------------------------+------------------------------------------------
var(Residual) | 10.3135 2.15051 6.853596 15.52006
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 54.85 Prob > chi2 = 0.0000
Next, we reshape the data back to wide and run the unconditional growth model using the sem command. With this type of growth model we treat the intercept, I and the slope, S as latent variables. We will follow the convention that latent variable are in upper case while manifest variables are in lower case.
reshape wide
(note: j = 0 1 2)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 138 -> 46
Number of variables 5 -> 6
j variable (3 values) time -> (dropped)
xij variables:
dep -> dep0 dep1 dep2
-----------------------------------------------------------------------------
sem (dep0 <- I@1 S@0 _cons@0) ///
(dep1 <- I@1 S@1 _cons@0) ///
(dep2 <- I@1 S@2 _cons@0), ///
var(e.dep0@var e.dep1@var e.dep2@var) ///
means(I S)
Endogenous variables
Measurement: dep0 dep1 dep2
Exogenous variables
Latent: I S
Fitting target model:
Iteration 0: log likelihood = -418.88676
Iteration 1: log likelihood = -415.26423
Iteration 2: log likelihood = -414.28594
Iteration 3: log likelihood = -414.25861
Iteration 4: log likelihood = -414.25832
Iteration 5: log likelihood = -414.25832
Structural equation model Number of obs = 46
Estimation method = ml
Log likelihood = -414.25832
( 1) [dep0]I = 1
( 2) [dep1]I = 1
( 3) [dep1]S = 1
( 4) [dep2]I = 1
( 5) [dep2]S = 2
( 6) [var(e.dep0)]_cons - [var(e.dep2)]_cons = 0
( 7) [var(e.dep1)]_cons - [var(e.dep2)]_cons = 0
( 8) [dep0]_cons = 0
( 9) [dep1]_cons = 0
(10) [dep2]_cons = 0
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z Pgt;|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Measurement |
dep0 <- |
I | 1 (constrained)
_cons | 0 (constrained)
-----------+----------------------------------------------------------------
dep1 <- |
I | 1 (constrained)
S | 1 (constrained)
_cons | 0 (constrained)
-----------+----------------------------------------------------------------
dep2 <- |
I | 1 (constrained)
S | 2 (constrained)
_cons | 0 (constrained)
-------------+----------------------------------------------------------------
mean(I)| 14.18924 .814712 17.42 0.000 12.59243 15.78605
mean(S)| -1.6025 .4262611 -3.76 0.000 -2.437956 -.7670436
-------------+----------------------------------------------------------------
var(e.dep0)| 10.3135 2.150514 6.853595 15.52008
var(e.dep1)| 10.3135 2.150514 6.853595 15.52008
var(e.dep2)| 10.3135 2.150514 6.853595 15.52008
var(I)| 21.93818 6.613939 12.15009 39.61152
var(S)| 3.20138 2.047803 .913809 11.21551
-------------+----------------------------------------------------------------
cov(I,S)| -1.153606 2.751291 -0.42 0.675 -6.546037 4.238825
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(3) = 21.79, Prob > chi2 = 0.0001
Comparing the sem model with the mixed model shows that the parameter estimates are the same.
Time invariant covariate
Next, we will go back to the long form, run a mixed model adding a time invariant covariate, pre.
reshape long
(note: j = 0 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 46 -> 138
Number of variables 6 -> 5
j variable (3 values) -> time
xij variables:
dep0 dep1 dep2 -> dep
-----------------------------------------------------------------------------
mixed dep time pre || sid:time, var cov(unstr)
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -411.12263
Iteration 1: log likelihood = -411.10613
Iteration 2: log likelihood = -411.10612
Computing standard errors:
Mixed-effects ML regression Number of obs = 138
Group variable: sid Number of groups = 46
Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 21.21
Log likelihood = -411.10612 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
dep | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | -1.6025 .4262611 -3.76 0.000 -2.437956 -.7670435
pre | .5051742 .1899545 2.66 0.008 .1328702 .8774781
_cons | 3.564548 4.073481 0.88 0.382 -4.419328 11.54842
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
sid: Unstructured |
var(time) | 3.201384 2.047796 .9138156 11.21546
var(_cons) | 20.50672 6.374829 11.15031 37.71423
cov(time,_cons) | -2.289095 2.799971 -7.776937 3.198747
-----------------------------+------------------------------------------------
var(Residual) | 10.3135 2.15051 6.853597 15.52007
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 45.83 Prob > chi2 = 0.0000
This last analysis is followed by its sem equivalent.
reshape wide (note: j = 0 1 2) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 138 -> 46 Number of variables 5 -> 6 j variable (3 values) time -> (dropped) xij variables: dep -> dep0 dep1 dep2 -----------------------------------------------------------------------------sem (dep0 <- I@1 S@0 pre@p1 _cons@0) /// (dep1 <- I@1 S@1 pre@p1 _cons@0) /// (dep2 <- I@1 S@2 pre@p1 _cons@0), /// var(e.dep0@var e.dep1@var e.dep2@var) /// means(I S) covar(pre*I@0 pre*S@0) Endogenous variables Observed: dep0 dep1 dep2 Exogenous variables Observed: pre Latent: I S Fitting target model: Iteration 0: log likelihood = -563.45979 (not concave) Iteration 1: log likelihood = -549.01197 Iteration 2: log likelihood = -538.31305 Iteration 3: log likelihood = -536.40749 Iteration 4: log likelihood = -536.3017 Iteration 5: log likelihood = -536.30149 Iteration 6: log likelihood = -536.30149 Structural equation model Number of obs = 46 Estimation method = ml Log likelihood = -536.30149 ( 1) [dep0]pre - [dep2]pre = 0 ( 2) [dep0]I = 1 ( 3) [dep1]pre - [dep2]pre = 0 ( 4) [dep1]I = 1 ( 5) [dep1]S = 1 ( 6) [dep2]I = 1 ( 7) [dep2]S = 2 ( 8) [var(e.dep0)]_cons - [var(e.dep2)]_cons = 0 ( 9) [var(e.dep1)]_cons - [var(e.dep2)]_cons = 0 (10) [cov(pre,I)]_cons = 0 (11) [cov(pre,S)]_cons = 0 (12) [dep0]_cons = 0 (13) [dep1]_cons = 0 (14) [dep2]_cons = 0 ------------------------------------------------------------------------------ | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | dep0 <- | pre | .5051742 .1943431 2.60 0.009 .1242686 .8860797 I | 1 (constrained) _cons | 0 (constrained) -----------+---------------------------------------------------------------- dep1 <- | pre | .5051742 .1943431 2.60 0.009 .1242686 .8860797 I | 1 (constrained) S | 1 (constrained) _cons | 0 (constrained) -----------+---------------------------------------------------------------- dep2 <- | pre | .5051742 .1943431 2.60 0.009 .1242686 .8860797 I | 1 (constrained) S | 2 (constrained) _cons | 0 (constrained) -------------+---------------------------------------------------------------- Mean | I | 3.564548 4.164044 0.86 0.392 -4.596828 11.72592 S | -1.6025 .4262611 -3.76 0.000 -2.437956 -.7670436 -------------+---------------------------------------------------------------- Variance | e.dep0 | 10.3135 2.150514 6.853595 15.52008 e.dep1 | 10.3135 2.150514 6.853595 15.52008 e.dep2 | 10.3135 2.150514 6.853595 15.52008 I | 20.50671 6.374829 11.1503 37.71422 S | 3.20138 2.047803 .913809 11.21551 -------------+---------------------------------------------------------------- Covariance | pre | I | 0 (constrained) S | 0 (constrained) -----------+---------------------------------------------------------------- I | S | -2.289091 2.79998 -0.82 0.414 -7.776951 3.198769 ------------------------------------------------------------------------------ LR test of model vs. saturated: chi2(5) = 23.93, Prob > chi2 = 0.0002
Once again, the results are equivalent.
Time invariant covariate with cross-level interaction
This time we are going to add a cross-level interaction. Since, by now, you are accustomed to the of reshape long, mixed, reshape wide and sem, we will run everything in one long block of code and results.
Because we are predicting I and S with the time invariant covariate in the sem model, we can no longer request mean(I S). These mean values will become parameters in the sem output.
reshape long
(note: j = 0 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 46 -> 138
Number of variables 6 -> 5
j variable (3 values) -> time
xij variables:
dep0 dep1 dep2 -> dep
-----------------------------------------------------------------------------
mixed dep c.time##c.pre || sid:time, var cov(unstr)
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -410.07935
Iteration 1: log likelihood = -410.05546
Iteration 2: log likelihood = -410.05544
Computing standard errors:
Mixed-effects ML regression Number of obs = 138
Group variable: sid Number of groups = 46
Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(3) = 24.02
Log likelihood = -410.05544 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
dep | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | -5.094745 2.417808 -2.11 0.035 -9.833561 -.3559284
pre | .3572517 .2150802 1.66 0.097 -.0642978 .7788012
|
c.time#c.pre | .1660464 .1132403 1.47 0.143 -.0559005 .3879933
|
_cons | 6.675614 4.592206 1.45 0.146 -2.324943 15.67617
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
sid: Unstructured |
var(time) | 2.828174 1.981987 .7161158 11.16938
var(_cons) | 20.21054 6.267935 11.00507 37.11613
cov(time,_cons) | -1.95662 2.693749 -7.236271 3.32303
-----------------------------+------------------------------------------------
var(Residual) | 10.31349 2.150505 6.853593 15.52004
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 46.84 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
reshape wide
(note: j = 0 1 2)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 138 -> 46
Number of variables 5 -> 6
j variable (3 values) time -> (dropped)
xij variables:
dep -> dep0 dep1 dep2
-----------------------------------------------------------------------------
sem (dep0 <- I@1 S@0 _cons@0) ///
(dep1 <- I@1 S@1 _cons@0) ///
(dep2 <- I@1 S@2 _cons@0) ///
(I <- pre _cons) (S <- pre _cons), ///
var(e.dep0@var e.dep1@var e.dep2@var) ///
covar(e.I*e.S)
Endogenous variables
Measurement: dep0 dep1 dep2
Latent: I S
Exogenous variables
Observed: pre
Fitting target model:
Iteration 0: log likelihood = -836.11945 (not concave)
Iteration 1: log likelihood = -629.09569 (not concave)
Iteration 2: log likelihood = -572.06538 (not concave)
Iteration 3: log likelihood = -544.36594 (not concave)
Iteration 4: log likelihood = -540.10377
Iteration 5: log likelihood = -536.92737
Iteration 6: log likelihood = -535.30688
Iteration 7: log likelihood = -535.25089
Iteration 8: log likelihood = -535.25081
Iteration 9: log likelihood = -535.25081
Structural equation model Number of obs = 46
Estimation method = ml
Log likelihood = -535.25081
( 1) [dep0]I = 1
( 2) [dep1]I = 1
( 3) [dep1]S = 1
( 4) [dep2]I = 1
( 5) [dep2]S = 2
( 6) [var(e.dep0)]_cons - [var(e.dep2)]_cons = 0
( 7) [var(e.dep1)]_cons - [var(e.dep2)]_cons = 0
( 8) [dep0]_cons = 0
( 9) [dep1]_cons = 0
(10) [dep2]_cons = 0
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
I <- |
pre | .3572517 .2150802 1.66 0.097 -.0642977 .7788011
_cons | 6.675614 4.592205 1.45 0.146 -2.324941 15.67617
-----------+----------------------------------------------------------------
S <- |
pre | .1660464 .1132402 1.47 0.143 -.0559003 .3879931
_cons | -5.094745 2.417806 -2.11 0.035 -9.833558 -.3559314
-------------+----------------------------------------------------------------
Measurement |
dep0 <- |
I | 1 (constrained)
_cons | 0 (constrained)
-----------+----------------------------------------------------------------
dep1 <- |
I | 1 (constrained)
S | 1 (constrained)
_cons | 0 (constrained)
-----------+----------------------------------------------------------------
dep2 <- |
I | 1 (constrained)
S | 2 (constrained)
_cons | 0 (constrained)
-------------+----------------------------------------------------------------
var(e.dep0)| 10.3135 2.150514 6.853595 15.52008
var(e.dep1)| 10.3135 2.150514 6.853595 15.52008
var(e.dep2)| 10.3135 2.150514 6.853595 15.52008
var(e.I)| 20.21051 6.267933 11.00505 37.11611
var(e.S)| 2.828156 1.981993 .716102 11.16945
-------------+----------------------------------------------------------------
cov(e.I,e.S)| -1.956604 2.693753 -0.73 0.468 -7.236263 3.323055
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(4) = 21.83, Prob > chi2 = 0.0002
Time-varying covariate
What if you have a time-varying covariate? We are going to switch datasets to lsay_long_clean to show an example with a time varying covariate, att.
use https://stats.idre.ucla.edu/stat/data/lsay_long_clean, clear
mixed math c.yr c.att || id:yr, var cov(unstr)
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -36146.122
Iteration 1: log likelihood = -36144.71
Iteration 2: log likelihood = -36144.708
Computing standard errors:
Mixed-effects ML regression Number of obs = 10785
Group variable: id Number of groups = 3595
Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 2340.50
Log likelihood = -36144.708 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
math | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yr | 2.64315 .0546525 48.36 0.000 2.536033 2.750267
att | .1700024 .0253111 6.72 0.000 .1203936 .2196112
_cons | 54.67699 .3330636 164.16 0.000 54.0242 55.32978
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured |
var(yr) | 3.348592 .3030205 2.804371 3.998427
var(_cons) | 110.5491 2.912331 104.9859 116.4071
cov(yr,_cons) | -.0107825 .6369843 -1.259249 1.237684
-----------------------------+------------------------------------------------
var(Residual) | 14.50231 .3427178 13.84592 15.18983
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(3) = 10678.18 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
Back to the old drill of reshaping wide and running a sem model. This model proved to be a bit fussier and required that we provide starting values for the coefficients. To obtain proper starting values we ran a simpler model and saved the results into a matrix. We then used these results as starting values for the full model.
reshape wide math att, i(id) j(yr) (note: j = 0 1 2) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 10785 -> 3595 Number of variables 7 -> 10 j variable (3 values) yr -> (dropped) xij variables: math -> math0 math1 math2 att -> att0 att1 att2 ----------------------------------------------------------------------------- sem (math0 <- I@1 S@0 _cons@0) /// (math1 <- I@1 S@1 _cons@0) /// (math2 <- I@1 S@2 _cons@0), /// var(e.math0@var e.math1@var e.math2@var) /// means(I S) mat b = e(b) sem (math0 <- I@1 S@0 att0@b1 _cons@0) /// (math1 <- I@1 S@1 att1@b1 _cons@0) /// (math2 <- I@1 S@2 att2@b1 _cons@0), /// var(e.math0@var e.math1@var e.math2@var) /// means(I S) covar(att0*I@0 att1*I@0 att2*I@0) /// covar(att0*S@0 att1*S@0 att2*S@0) /// from(b)Endogenous variables Observed: math0 math1 math2 Exogenous variables Observed: att0 att1 att2 Latent: I S Fitting target model: Iteration 0: log likelihood = -61901.22 Iteration 1: log likelihood = -60959.753 Iteration 2: log likelihood = -60758.068 Iteration 3: log likelihood = -60746.189 Iteration 4: log likelihood = -60746.116 Iteration 5: log likelihood = -60746.116 Structural equation model Number of obs = 3,595 Estimation method = ml Log likelihood = -60746.116 ( 1) [math0]att0 - [math2]att2 = 0 ( 2) [math0]I = 1 ( 3) [math1]att1 - [math2]att2 = 0 ( 4) [math1]I = 1 ( 5) [math1]S = 1 ( 6) [math2]I = 1 ( 7) [math2]S = 2 ( 8) [var(e.math0)]_cons - [var(e.math2)]_cons = 0 ( 9) [var(e.math1)]_cons - [var(e.math2)]_cons = 0 (10) [math0]_cons = 0 (11) [math1]_cons = 0 (12) [math2]_cons = 0 ------------------------------------------------------------------------------ | OIM | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Structural | math0 <- | att0 | .1700025 .025449 6.68 0.000 .1201234 .2198816 I | 1 (constrained) _cons | 0 (constrained) -----------+---------------------------------------------------------------- math1 <- | att1 | .1700025 .025449 6.68 0.000 .1201234 .2198816 I | 1 (constrained) S | 1 (constrained) _cons | 0 (constrained) -----------+---------------------------------------------------------------- math2 <- | att2 | .1700025 .025449 6.68 0.000 .1201234 .2198816 I | 1 (constrained) S | 2 (constrained) _cons | 0 (constrained) -------------+---------------------------------------------------------------- mean(I)| 54.67699 .3343215 163.55 0.000 54.02173 55.33225 mean(S)| 2.64315 .0546563 48.36 0.000 2.536026 2.750275 -------------+---------------------------------------------------------------- var(e.math0)| 14.50234 .3427203 13.84594 15.18986 var(e.math1)| 14.50234 .3427203 13.84594 15.18986 var(e.math2)| 14.50234 .3427203 13.84594 15.18986 var(I)| 110.5491 2.91233 104.9859 116.4071 var(S)| 3.348555 .3030222 2.804331 3.998394 -------------+---------------------------------------------------------------- cov(I,S)| -.0107522 .6369845 -0.02 0.987 -1.259219 1.237714 ------------------------------------------------------------------------------ LR test of model vs. saturated: chi2(11) = 201.05, Prob > chi2 = 0.0000
We hope this helps get you started with linear growth models.
