Mplus version 5.2 was used for these examples.
To illustrate longitudinal data analysis using Mplus, we will use an example data set from Chapter 5 of Hox’s Multilevel Analysis: Techniques and Applications. The data set contains GPAs for each subject measured at six time points; hence, the data are longitudinal. These data have a hierarchical structure because measurements over the six time points are nested in students. Longitudinal modeling is a special case of multilevel modeling.
In Mplus, a longitudinal model can be analyzed in one of the two ways, a univariate approach using the long format of the data or a multivariate approach using the wide format of the same data. The approach using the long format data is in the framework of the multilevel modeling approach, while the approach using the wide format data is in the framework of structural equation modeling. We will show both approaches in this section.
1.1 Longitudinal modeling in long format – example 1: random intercept model
When the data set is in long format, it contains multiple rows per subject. In this example, each student has at most six rows of data coming from the measurements over the six time points. Here are observations of some of the variables for the first three subjects from the data set.
student time gpa job sex 1 0 2.3 2 1 1 1 2.1 2 1 1 2 3 2 1 1 3 3 2 1 1 4 3 2 1 1 5 3.3 2 1 2 0 2.2 2 0 2 1 2.5 3 0 2 2 2.6 2 0 2 3 2.6 2 0 2 4 3 2 0 2 5 2.8 2 0 3 0 2.4 2 1 3 1 2.9 2 1 3 2 3 2 1 3 3 2.8 3 1 3 4 3.3 2 1 3 5 3.4 2 1
Our first model will be that the student’s GPA is a linearly related to time. The intercept is random, meaning it could change across subjects, but the slope of time is fixed, meaning the effect of time is the same across all the subjects. Mathematically, here is our first model:
gpa_it = beta_0i + beta_1*time + e_it
where "i" stands for individual and "t" stands for time.
We also need to specify the nesting structure. The keywords for describing the nesting structures are cluster, within and between. A variable is a within variable if it is time-varying, such as the job status; and a variable is a between variable if it is not time-vary, such as the student’s gender. We specify the type of analysis to be twolevel and random for running a longitudinal model. The model statement is very minimal since the default model is a random intercept model. We should also note that you can add comments to your input file by starting them with a !. You can start a comment at the beginning of a line or after the semi-colon that ends a line. We encourage researchers to comment their input files.
data: File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat ; variable: Names are student highgpa gpa job admitted occas time sex; Missing are all (-9999) ; usevariables are gpa time; cluster = student; within = time ; analysis: type = twolevel random; estimator=ml; !default is mlr model: %within% gpa on time;INPUT READING TERMINATED NORMALLY SUMMARY OF ANALYSIS Number of groups 1 Number of observations 1200 Number of dependent variables 1 Number of independent variables 1 Number of continuous latent variables 0 Observed dependent variables Continuous GPA Observed independent variables TIME Variables with special functions Cluster variable STUDENT Within variables TIME Estimator ML Information matrix OBSERVED Maximum number of iterations 100 Convergence criterion 0.100D-05 Maximum number of EM iterations 500 Convergence criteria for the EM algorithm Loglikelihood change 0.100D-02 Relative loglikelihood change 0.100D-05 Derivative 0.100D-03 Minimum variance 0.100D-03 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Optimization algorithm EMA Input data file(s) https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat Input data format FREE SUMMARY OF DATA Number of missing data patterns 1 Number of clusters 200 Average cluster size 6.000 Estimated Intraclass Correlations for the Y Variables Intraclass Variable Correlation GPA 0.411 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage GPA TIME ________ ________ GPA 1.000 TIME 1.000 1.000 THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 0.000 Degrees of Freedom 0 P-Value 1.0000 Chi-Square Test of Model Fit for the Baseline Model Value 519.807 Degrees of Freedom 1 P-Value 0.0000 CFI/TLI CFI 1.000 TLI 1.000 Loglikelihood H0 Value -196.825 H1 Value -196.825 Information Criteria Number of Free Parameters 4 Akaike (AIC) 401.649 Bayesian (BIC) 422.009 Sample-Size Adjusted BIC 409.304 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 SRMR (Standardized Root Mean Square Residual) Value for Within 0.000 Value for Between 0.000 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value Within Level GPA ON TIME 0.106 0.004 26.110 0.000 Residual Variances GPA 0.058 0.003 22.361 0.000 Between Level Means GPA 2.599 0.022 120.047 0.000 Variances GPA 0.063 0.007 8.661 0.000
1.2 Longitudinal modeling in long format – example 2: random intercept and random slope model
Our second model is a much more complicated model in which we allow random intercepts and random slopes of time on top of a more involved level-1 model where we add a new level-1 predictor variable, job. The random intercept is in turn a linear function of two between variables, sex and highgpa. Mathematically, here is our second model:
gpa_it = beta_0i +
beta_1i*time + beta_2*job + e_it
beta_0i = tau_0 + tau_1*highgpa +
tau_2*sex + u_0i
beta_1i = gamma_0 + u_1i
(u_0i, u_1i)
Let’s look at the model statement to see what has been added. There are two lines on the within statement. The first, gpa on job, indicates that we want a random intercept and a fixed effect for job. On the next line, we request a random slope that will be predicted by time. For the between part of the model, we will use highgpa and sex as predictors. Because we have different predictor variables for the intercept and slope, we will not have any cross-level interactions. On the last line, we allow the intercept to be correlated with the slope.
Data: File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat ; Variable: Names are student highgpa gpa job admitted occas time sex; Missing are all (-9999) ; usevariables are gpa time job student highgpa sex; cluster = student; within = time job ; between = highgpa sex; analysis: type = twolevel random; estimator=ml; !default is mlr - maximum likelihood with robust standard errors model: %within% gpa on job; s | gpa on time; %between% gpa on highgpa sex; gpa with s;Loglikelihood H0 Value -90.102 Information Criteria Number of Free Parameters 9 Akaike (AIC) 198.205 Bayesian (BIC) 244.016 Sample-Size Adjusted BIC 215.428 (n* = (n + 2) / 24) MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value Within Level GPA ON JOB -0.120 0.018 -6.684 0.000 Residual Variances GPA 0.042 0.002 19.894 0.000 Between Level GPA ON HIGHGPA 0.090 0.026 3.393 0.001 SEX 0.117 0.032 3.641 0.000 GPA WITH S -0.003 0.002 -1.645 0.100 Means S 0.104 0.006 18.491 0.000 Intercepts GPA 2.527 0.093 27.195 0.000 Variances S 0.004 0.001 6.060 0.000 Residual Variances GPA 0.039 0.006 6.264 0.000
2.1 Longitudinal modeling in wide format, replicating example 1 in section 1.1
Now let’s run the same model in example 1 from previous section using the multivariate approach. To this end the data set will have to be restructured to wide format, and it looks like the following after being restructured. (The restructuring can be done in Mplus, but it may be easier to do in another package where you can see the data.) Click here for the data set.
student gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 job0 job1 job2 job3 job4 job5 highgpa sex 1 2.3 2.1 3 3 3 3.3 2 2 2 2 2 2 2.8 1 2 2.2 2.5 2.6 2.6 3 2.8 2 3 2 2 2 2 2.5 0 3 2.4 2.9 3 2.8 3.3 3.4 2 2 2 3 2 2 2.5 1
Let’s recall that the model that we are going to run is the following:
gpa_it = beta_0i + beta_1*time + e_it
In the multivariate approach, we are more explicit about the latent variables involved in the model. In this case, we have potentially two latent variables, the random intercept and the random slope for time. We name them i (for intercept) and s (for slope). We assume that the two latent variables i and s are normally distributed, and we are interested in estimating the mean and the standard deviation of these two variables. The fact that we don’t allow the slope to vary across subjects, or equivalently that the slope is fixed, leads us to fix the variance of s at zero. Another technical point is that in the multilevel approach, the residual variance is homogeneous across all the time points. To reproduce the results of example 1 here, we will also fix the residual variance to be the same across all the six time points.
Data: File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat ; Variable: Names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 highgpa job0 job1 job2 job3 job4 job5 sex; Missing are all (-9999) ; usevariables are gpa0 gpa1 gpa2 gpa3 gpa4 gpa5; analysis: estimator = ml; model: i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5; s@0; !fix variance of the slope at zero gpa0 - gpa5 (1); !fix the residual variance to be same across time points
Before we run the model, let’s take a moment to review the model statement. This is very different from what we saw in the first section where we ran the model using the multilevel approach. As we have mentioned before, there are two latent variables being modeled, the intercept and the slope. In the growth curve modeling terminology, we call them intercept and slope growth factors. Mplus uses | to define these latent factors. The time scores for the slope are fixed using the symbol "@" at 0, 1, 2, 3, 4 and 5 since we are only modeling the linear growth. Implicitly, the intercept is defined to be the initial value of gpa at time = 0 since the time score for slope is set to be zero at time = 0. For more information, see the description of Example 8.1 in the Mplus User’s Guide. Now let’s take a look at the output.
*** WARNING in MODEL command All continuous latent variable covariances involving S have been fixed to 0 because the variance of S is fixed at 0. 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS SUMMARY OF ANALYSIS Number of groups 1 Number of observations 200 Number of dependent variables 6 Number of independent variables 0 Number of continuous latent variables 2 Observed dependent variables Continuous GPA0 GPA1 GPA2 GPA3 GPA4 GPA5 Continuous latent variables I S Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Input data file(s) https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat Input data format FREE SUMMARY OF DATA Number of missing data patterns 1 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage GPA0 GPA1 GPA2 GPA3 GPA4 ________ ________ ________ ________ ________ GPA0 1.000 GPA1 1.000 1.000 GPA2 1.000 1.000 1.000 GPA3 1.000 1.000 1.000 1.000 GPA4 1.000 1.000 1.000 1.000 1.000 GPA5 1.000 1.000 1.000 1.000 1.000 Covariance Coverage GPA5 ________ GPA5 1.000 THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 338.824 Degrees of Freedom 23 P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 811.632 Degrees of Freedom 15 P-Value 0.0000 CFI/TLI CFI 0.604 TLI 0.741 Loglikelihood H0 Value -196.825 H1 Value -27.413 Information Criteria Number of Free Parameters 4 Akaike (AIC) 401.649 Bayesian (BIC) 414.842 Sample-Size Adjusted BIC 402.170 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.262 90 Percent C.I. 0.238 0.287 Probability RMSEA <= .05 0.000 SRMR (Standardized Root Mean Square Residual) Value 0.293 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value I | GPA0 1.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 1.000 0.000 999.000 999.000 GPA3 1.000 0.000 999.000 999.000 GPA4 1.000 0.000 999.000 999.000 GPA5 1.000 0.000 999.000 999.000 S | GPA0 0.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 2.000 0.000 999.000 999.000 GPA3 3.000 0.000 999.000 999.000 GPA4 4.000 0.000 999.000 999.000 GPA5 5.000 0.000 999.000 999.000 Means I 2.599 0.022 120.047 0.000 S 0.106 0.004 26.109 0.000 Intercepts GPA0 0.000 0.000 999.000 999.000 GPA1 0.000 0.000 999.000 999.000 GPA2 0.000 0.000 999.000 999.000 GPA3 0.000 0.000 999.000 999.000 GPA4 0.000 0.000 999.000 999.000 GPA5 0.000 0.000 999.000 999.000 Variances I 0.063 0.007 8.661 0.000 S 0.000 0.000 999.000 999.000 Residual Variances GPA0 0.058 0.003 22.361 0.000 GPA1 0.058 0.003 22.361 0.000 GPA2 0.058 0.003 22.361 0.000 GPA3 0.058 0.003 22.361 0.000 GPA4 0.058 0.003 22.361 0.000 GPA5 0.058 0.003 22.361 0.000
2.2 Longitudinal modeling in wide format, example 1 revisited
Now let’s rerun our previous example, but relax our assumption regarding the residual variance. In this example, we allow the residual variance across the six time points to be different from each other. To this end, we just simply comment out last line of the code in which we constrain the residual variance to be the same. The point is that this approach in wide format gives us more flexibility in modeling besides a different angle to look at the same model.
Data: File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat ; Variable: Names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 highgpa job0 job1 job2 job3 job4 job5 sex; Missing are all (-9999) ; usevariables are gpa0 gpa1 gpa2 gpa3 gpa4 gpa5; analysis: estimator = ml; model: i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5; s@0; !gpa0 - gpa5 (1);TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 186.267 Degrees of Freedom 18 P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 811.632 Degrees of Freedom 15 P-Value 0.0000 CFI/TLI CFI 0.789 TLI 0.824 Loglikelihood H0 Value -120.546 H1 Value -27.413 Information Criteria Number of Free Parameters 9 Akaike (AIC) 259.093 Bayesian (BIC) 288.778 Sample-Size Adjusted BIC 260.265 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.216 90 Percent C.I. 0.189 0.245 Probability RMSEA <= .05 0.000 SRMR (Standardized Root Mean Square Residual) Value 0.782 MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value I | GPA0 1.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 1.000 0.000 999.000 999.000 GPA3 1.000 0.000 999.000 999.000 GPA4 1.000 0.000 999.000 999.000 GPA5 1.000 0.000 999.000 999.000 S | GPA0 0.000 0.000 999.000 999.000 GPA1 1.000 0.000 999.000 999.000 GPA2 2.000 0.000 999.000 999.000 GPA3 3.000 0.000 999.000 999.000 GPA4 4.000 0.000 999.000 999.000 GPA5 5.000 0.000 999.000 999.000 Means I 2.599 0.026 99.187 0.000 S 0.106 0.004 26.313 0.000 Intercepts GPA0 0.000 0.000 999.000 999.000 GPA1 0.000 0.000 999.000 999.000 GPA2 0.000 0.000 999.000 999.000 GPA3 0.000 0.000 999.000 999.000 GPA4 0.000 0.000 999.000 999.000 GPA5 0.000 0.000 999.000 999.000 Variances I 0.093 0.010 9.226 0.000 S 0.000 0.000 999.000 999.000 Residual Variances GPA0 0.138 0.015 9.475 0.000 GPA1 0.094 0.010 9.299 0.000 GPA2 0.054 0.006 8.925 0.000 GPA3 0.026 0.003 7.834 0.000 GPA4 0.017 0.003 6.547 0.000 GPA5 0.026 0.003 7.704 0.000