Mplus version 8 was used for these examples. All the files for this portion of this seminar can be downloaded here.
To illustrate longitudinal data analysis using Mplus we will use an example data set from Chapter 5 of Hox’s Multilevel Analysis: Techniques and Applications. The data set contains six GPAs for each subject measured at six time points, hence longitudinal. It has the hierarchical structure where measurements over the six time points are nested in students. Longitudinal modeling is a special case of multilevel modeling.
In Mplus a longitudinal model can be analyzed in one of the two ways: a univariate approach using the long format of the data (gpa_ch5_hox.dat) or a multivariate approach using the wide format (gpa_ch5_hox_wide.dat) of the same data. The approach using the long format is in the framework of multilevel modeling approach, while the approach using the wide format is in the framework of structural equation modeling (i.e., latent growth modeling). We will show both approaches in this section.
1.0 Longitudinal modeling in wide format, latent growth modeling
In order to perform a latent growth model, the data set will have to be restructured to wide format:
student gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 job0 job1 job2 job3 job4 job5 highgpa sex 1 2.3 2.1 3 3 3 3.3 2 2 2 2 2 2 2.8 1 2 2.2 2.5 2.6 2.6 3 2.8 2 3 2 2 2 2 2.5 0 3 2.4 2.9 3 2.8 3.3 3.4 2 2 2 3 2 2 2.5 1
In the multivariate approach, we are more explicit about the latent variables involved in the model. In this example, we have potentially two latent variables, the random intercept and the random slope for time. We name them Intercept and Slope. We assume that the two latent are normally distributed, and we are interested in estimating the mean and variance of the two variables. The model that we are going to run is the following:
$$ GPA_i = \Lambda \eta + \epsilon_i $$
where \(\Lambda\) is a \(6 \times 2\) matrix of loadings for the intercept and slope, \(\eta\) is a \(2\times1\) vector of the latent intercept and slope with latent means \(\alpha_{intercept}\) and \(\alpha_{slope}\). The \(6 \times 1\) vector of residuals is defined by \(\epsilon\).
Here is a figure representing our latent growth model:
Multiplying out the equation above, the six equations for each time point defined for a person \(i\) and timepoint \(t\) is:
$$ GPA_{it} = Intercept_i + \lambda_t*Slope_i + e_{it} $$
for \(t={0,1,2,3,4,5}\). Equivalently, the equation above can be spelled out with the corresponding six equations:
$$\begin{eqnarray} GPA_{i0} & = & Intercept_{i} + 0*Slope_{i} + e_{i0} \\ GPA_{i1} & = & Intercept_{i} + 1*Slope_{i} + e_{i1} \\ GPA_{i2} & = & Intercept_{i} + 2*Slope_{i} + e_{i2} \\ GPA_{i3} & = & Intercept_{i} + 3*Slope_{i} + e_{i3} \\ GPA_{i4} & = & Intercept_{i} + 4*Slope_{i} + e_{i4} \\ GPA_{i5} & = & Intercept_{i} + 5*Slope_{i} + e_{i5} \end{eqnarray} $$
where the (symmetric) variance covariance matrix of the intercept and slope is defined as:
$$ \mathbf{\Psi} = \begin{bmatrix} \psi_{intercept} & \\ \psi_{int,slope} & \psi_{slope} \end{bmatrix} $$
A technical point is that in the multilevel approach, the residual variance is homogeneous across all the time points. In order to match results to the multilevel model, we will also fix the residual variance to be the same across all the six time points.
title: Wide data format with random intercept data: file is gpa_ch5_hox_wide.dat; variable: names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 highgpa job0 job1 job2 job3 job4 job5 sex; missing are all (-9999); usevariables are gpa0 gpa1 gpa2 gpa3 gpa4 gpa5; analysis: estimator = ml; model: i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5; gpa0 - gpa5 (1); !fix the residual variance to be same across time pointsMODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value I | GPA0 1.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 1.000 0.000 999.000 999.000 GPA3 1.000 0.000 999.000 999.000 GPA4 1.000 0.000 999.000 999.000 GPA5 1.000 0.000 999.000 999.000 S | GPA0 0.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 2.000 0.000 999.000 999.000 GPA3 3.000 0.000 999.000 999.000 GPA4 4.000 0.000 999.000 999.000 GPA5 5.000 0.000 999.000 999.000 S WITH I -0.001 0.002 -0.834 0.404 Means I 2.599 0.018 141.947 0.000 S 0.106 0.006 18.111 0.000 Intercepts GPA0 0.000 0.000 999.000 999.000 GPA1 0.000 0.000 999.000 999.000 GPA2 0.000 0.000 999.000 999.000 GPA3 0.000 0.000 999.000 999.000 GPA4 0.000 0.000 999.000 999.000 GPA5 0.000 0.000 999.000 999.000 Variances I 0.045 0.007 6.599 0.000 S 0.004 0.001 6.387 0.000 Residual Variances GPA0 0.042 0.002 20.000 0.000 GPA1 0.042 0.002 20.000 0.000 GPA2 0.042 0.002 20.000 0.000 GPA3 0.042 0.002 20.000 0.000 GPA4 0.042 0.002 20.000 0.000 GPA5 0.042 0.002 20.000 0.000
2.0 Longitudinal modeling in wide format, without constraining the error variances
Now let’s rerun our previous example, but relax our assumption on residual variance. In this example, we allow the residual variance across the six time points to be different from each other. To this end, we just simply comment out last line of the code in which we constrain the residual variance to be the same. The point is that this approach in wide format gives us more flexibility in modeling besides a different angle to look at the same model.
title: Wide data format without constraining the error variances data: file is gpa_ch5_hox_wide.dat; variable: names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 highgpa job0 job1 job2 job3 job4 job5 sex; missing are all (-9999); usevariables are gpa0 gpa1 gpa2 gpa3 gpa4 gpa5; analysis: estimator = ml; model: i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5; s@0; !gpa0 - gpa5 (1); MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value I | GPA0 1.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 1.000 0.000 999.000 999.000 GPA3 1.000 0.000 999.000 999.000 GPA4 1.000 0.000 999.000 999.000 GPA5 1.000 0.000 999.000 999.000 S | GPA0 0.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 2.000 0.000 999.000 999.000 GPA3 3.000 0.000 999.000 999.000 GPA4 4.000 0.000 999.000 999.000 GPA5 5.000 0.000 999.000 999.000 S WITH I 0.002 0.002 1.565 0.118 Means I 2.598 0.018 141.886 0.000 S 0.106 0.005 20.317 0.000 Intercepts GPA0 0.000 0.000 999.000 999.000 GPA1 0.000 0.000 999.000 999.000 GPA2 0.000 0.000 999.000 999.000 GPA3 0.000 0.000 999.000 999.000 GPA4 0.000 0.000 999.000 999.000 GPA5 0.000 0.000 999.000 999.000 Variances I 0.035 0.007 4.937 0.000 S 0.003 0.001 5.593 0.000 Residual Variances GPA0 0.080 0.010 8.049 0.000 GPA1 0.071 0.008 8.518 0.000 GPA2 0.054 0.006 9.020 0.000 GPA3 0.029 0.003 8.486 0.000 GPA4 0.015 0.003 5.589 0.000 GPA5 0.016 0.004 4.336 0.000
3.0 Longitudinal modeling in long format – random intercept model
When the data set is in long format, it contains multiple rows per subject. In this example, each student has at most six rows of data coming from the measurements over the six time points. Here are observations of some of the variables for the first three subjects from the data set.
student time gpa job sex 1 0 2.3 2 1 1 1 2.1 2 1 1 2 3 2 1 1 3 3 2 1 1 4 3 2 1 1 5 3.3 2 1 2 0 2.2 2 0 2 1 2.5 3 0 2 2 2.6 2 0 2 3 2.6 2 0 2 4 3 2 0 2 5 2.8 2 0 3 0 2.4 2 1 3 1 2.9 2 1 3 2 3 2 1 3 3 2.8 3 1 3 4 3.3 2 1 3 5 3.4 2 1
Our first multilevel model will be that GPA is a linearly related to time. The intercept is random, meaning it could change across subjects, but the slope of time is fixed, meaning the effect of time is the same across all the subjects. Mathematically, here is our first model:
$$\begin{eqnarray} L1&: & GPA_{it} & = & \beta_{0i} + \beta_{1i}*TIME_{it} + e_{it} \\ L2&: & \beta_{0i} & = & \gamma_{00} + u_{0i} \\ && \beta_{1i} & = & \gamma_{10} \end{eqnarray}$$
where \(i\) stands for individual and \(t\) stands for time.
We also need to specify the nesting structure. The key words here for describing the nesting structures are cluster, within and between. A variable is a within variable if it is time-varying, such as the job status. A variable is a between variable if it is not time-vary, such as student’s gender. We specify the type of analysis to be twolevel and random for running a longitudinal model. The model statement is very minimal since the default model is a random intercept model.
title: Long data format with random intercept data: file is gpa_ch5_hox.dat; variable: names are student highgpa gpa job admitted occas time sex; missing are all (-9999); usevariables are gpa time; cluster = student; within = time; analysis: type = twolevel random; estimator = ml; !default is mlr model: %within% gpa on time; %between% gpa; SUMMARY OF ANALYSIS Number of groups 1 Number of observations 1200 Number of dependent variables 1 Number of independent variables 1 Number of continuous latent variables 0 Observed dependent variables Continuous GPA Observed independent variables TIME Variables with special functions Cluster variable STUDENT Within variables TIME Estimator ML Information matrix OBSERVED Maximum number of iterations 100 Convergence criterion 0.100D-05 Maximum number of EM iterations 500 Convergence criteria for the EM algorithm Loglikelihood change 0.100D-02 Relative loglikelihood change 0.100D-05 Derivative 0.100D-03 Minimum variance 0.100D-03 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Optimization algorithm EMA Input data file(s) gpa_ch5_hox.dat Input data format FREE SUMMARY OF DATA Number of missing data patterns 1 Number of clusters 200 Average cluster size 6.000 Estimated Intraclass Correlations for the Y Variables Intraclass Variable Correlation GPA 0.411 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage GPA TIME ________ ________ GPA 1.000 TIME 1.000 1.000 THE MODEL ESTIMATION TERMINATED NORMALLY MODEL FIT INFORMATION Number of Free Parameters 4 Loglikelihood H0 Value -196.825 H1 Value -196.825 Information Criteria Akaike (AIC) 401.649 Bayesian (BIC) 422.009 Sample-Size Adjusted BIC 409.304 (n* = (n + 2) / 24) Chi-Square Test of Model Fit Value 0.000 Degrees of Freedom 0 P-Value 1.0000 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 CFI/TLI CFI 1.000 TLI 1.000 Chi-Square Test of Model Fit for the Baseline Model Value 519.807 Degrees of Freedom 1 P-Value 0.0000 SRMR (Standardized Root Mean Square Residual) Value for Within 0.000 Value for Between 0.000 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value Within Level GPA ON TIME 0.106 0.004 26.109 0.000 Residual Variances GPA 0.058 0.003 22.361 0.000 Between Level Means GPA 2.599 0.022 120.047 0.000 Variances GPA 0.063 0.007 8.661 0.000 QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.126E-01 (ratio of smallest to largest eigenvalue) DIAGRAM INFORMATION Mplus diagrams are currently not available for multilevel analysis. No diagram output was produced.
4.0 Longitudinal modeling in long format – random intercept and random slope model
Our second multilevel model is a much more complicated model in which we allow both a random intercept and random slope of time on top of a more involved level-1 model where we add a new level-1 predictor variable, job. The random intercept is in turn of linear function of between variable, sex and highgpa. Mathematically, here is our second model:
$$\begin{eqnarray} L1&: & GPA_{it} & = & \beta_{0i} + \beta_{1i}*TIME + \beta_{2t}*JOB + e_{it} \\ L2&: & \beta_{0i} & = & \gamma_{00} + \gamma_{01}*HIGHGPA + \gamma_{02}*SEX + u_{0i} \\ && \beta_{1i} & = & \gamma_{10} + u_{1i} \end{eqnarray}$$
Let’s look at the model statement to see what has been added. There are two lines on the within statement. The first, gpa on job, indicates that we want a random intercept and a fixed effect for job. On the next line, we request a random slope that will be predicted by time. For the between part of the model, we will use highgpa and sex as predictors. Because we have different predictor variables for the intercept and slope, we will not have any cross-level interactions. On the last line, we allow the intercept to be correlated with the slope.
title: Long data format with random intercept and random slope data: file is gpa_ch5_hox.dat; variable: names are student highgpa gpa job admitted occas time sex; missing are all (-9999); usevariables are gpa time job student highgpa sex; cluster = student; within = time job; between = highgpa sex; analysis: type = twolevel random; estimator=ml; !default is mlr model: %within% gpa on job; s | gpa on time; %between% gpa on highgpa sex; gpa with s; SUMMARY OF ANALYSIS Number of groups 1 Number of observations 1200 Number of dependent variables 1 Number of independent variables 4 Number of continuous latent variables 1 Observed dependent variables Continuous GPA Observed independent variables TIME JOB HIGHGPA SEX Continuous latent variables S Variables with special functions Cluster variable STUDENT Within variables TIME JOB Between variables HIGHGPA SEX Estimator ML Information matrix OBSERVED Maximum number of iterations 100 Convergence criterion 0.100D-05 Maximum number of EM iterations 500 Convergence criteria for the EM algorithm Loglikelihood change 0.100D-02 Relative loglikelihood change 0.100D-05 Derivative 0.100D-03 Minimum variance 0.100D-03 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Optimization algorithm EMA Input data file(s) gpa_ch5_hox.dat Input data format FREE SUMMARY OF DATA Number of missing data patterns 1 Number of clusters 200 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage GPA TIME JOB HIGHGPA SEX ________ ________ ________ ________ ________ GPA 1.000 TIME 1.000 1.000 JOB 1.000 1.000 1.000 HIGHGPA 1.000 1.000 1.000 1.000 SEX 1.000 1.000 1.000 1.000 1.000 THE MODEL ESTIMATION TERMINATED NORMALLY MODEL FIT INFORMATION Number of Free Parameters 9 Loglikelihood H0 Value -90.102 Information Criteria Akaike (AIC) 198.205 Bayesian (BIC) 244.016 Sample-Size Adjusted BIC 215.428 (n* = (n + 2) / 24) MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value Within Level GPA ON JOB -0.120 0.018 -6.684 0.000 Residual Variances GPA 0.042 0.002 19.894 0.000 Between Level GPA ON HIGHGPA 0.090 0.026 3.393 0.001 SEX 0.117 0.032 3.641 0.000 GPA WITH S -0.003 0.002 -1.645 0.100 Means S 0.104 0.006 18.491 0.000 Intercepts GPA 2.527 0.093 27.195 0.000 Variances S 0.004 0.001 6.060 0.000 Residual Variances GPA 0.039 0.006 6.264 0.000 QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.523E-03 (ratio of smallest to largest eigenvalue) DIAGRAM INFORMATION Mplus diagrams are currently not available for multilevel analysis. No diagram output was produced.