Mplus Class Notes Modeling Longitudinal Data

Mplus version 5.2 was used for these examples.

To illustrate longitudinal data analysis using Mplus, we will use an example data set from Chapter 5 of Hox’s Multilevel Analysis: Techniques and Applications. The data set contains GPAs for each subject measured at six time points; hence, the data are longitudinal. These data have a hierarchical structure because measurements over the six time points are nested in students. Longitudinal modeling is a special case of multilevel modeling.

In Mplus, a longitudinal model can be analyzed in one of the two ways, a univariate approach using the long format of the data or a multivariate approach using the wide format of the same data. The approach using the long format data is in the framework of the multilevel modeling approach, while the approach using the wide format data is in the framework of structural equation modeling. We will show both approaches in this section.

1.1 Longitudinal modeling in long format – example 1: random intercept model

When the data set is in long format, it contains multiple rows per subject. In this example, each student has at most six rows of data coming from the measurements over the six time points. Here are observations of some of the variables for the first three subjects from the data set.

    student   time   gpa   job   sex  
          1      0   2.3     2     1  
          1      1   2.1     2     1  
          1      2     3     2     1  
          1      3     3     2     1  
          1      4     3     2     1  
          1      5   3.3     2     1  
          2      0   2.2     2     0  
          2      1   2.5     3     0  
          2      2   2.6     2     0  
          2      3   2.6     2     0  
          2      4     3     2     0  
          2      5   2.8     2     0  
          3      0   2.4     2     1  
          3      1   2.9     2     1  
          3      2     3     2     1  
          3      3   2.8     3     1  
          3      4   3.3     2     1  
          3      5   3.4     2     1

Our first model will be that the student’s GPA is a linearly related to time. The intercept is random, meaning it could change across subjects, but the slope of time is fixed, meaning the effect of time is the same across all the subjects. Mathematically, here is our first model:

gpa_it = beta_0i + beta_1*time + e_it

where "i" stands for individual and "t" stands for time.

We also need to specify the nesting structure. The keywords for describing the nesting structures are cluster, within and between. A variable is a within variable if it is time-varying, such as the job status; and a variable is a between variable if it is not time-vary, such as the student’s gender. We specify the type of analysis to be twolevel and random for running a longitudinal model. The model statement is very minimal since the default model is a random intercept model. We should also note that you can add comments to your input file by starting them with a !. You can start a comment at the beginning of a line or after the semi-colon that ends a line. We encourage researchers to comment their input files.

data:
  File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat ;
variable:
  Names are student highgpa gpa job admitted occas time sex;
  Missing are all (-9999) ; 
  usevariables are gpa time;
  cluster = student;
  within = time ;
analysis: type = twolevel random;
          estimator=ml; !default is mlr
model: 
  %within%
  gpa on time;

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                        1200

Number of dependent variables                                    1
Number of independent variables                                  1
Number of continuous latent variables                            0

Observed dependent variables

  Continuous
   GPA

Observed independent variables
   TIME

Variables with special functions

  Cluster variable      STUDENT
  Within variables
   TIME

Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                   100
Convergence criterion                                    0.100D-05
Maximum number of EM iterations                                500
Convergence criteria for the EM algorithm
  Loglikelihood change                                   0.100D-02
  Relative loglikelihood change                          0.100D-05
  Derivative                                             0.100D-03
Minimum variance                                         0.100D-03
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03
Optimization algorithm                                         EMA

Input data file(s)
  https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat
Input data format  FREE

SUMMARY OF DATA

     Number of missing data patterns             1
     Number of clusters                        200

     Average cluster size        6.000

     Estimated Intraclass Correlations for the Y Variables

                Intraclass
     Variable  Correlation

     GPA          0.411


COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value   0.100

     PROPORTION OF DATA PRESENT

           Covariance Coverage
              GPA           TIME
              ________      ________
 GPA            1.000
 TIME           1.000         1.000

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              0.000
          Degrees of Freedom                     0
          P-Value                           1.0000

Chi-Square Test of Model Fit for the Baseline Model

          Value                            519.807
          Degrees of Freedom                     1
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.000

Loglikelihood

          H0 Value                        -196.825
          H1 Value                        -196.825

Information Criteria

          Number of Free Parameters              4
          Akaike (AIC)                     401.649
          Bayesian (BIC)                   422.009
          Sample-Size Adjusted BIC         409.304
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000

SRMR (Standardized Root Mean Square Residual)

          Value for Within                   0.000
          Value for Between                  0.000

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

Within Level

 GPA        ON
    TIME               0.106      0.004     26.110      0.000

 Residual Variances
    GPA                0.058      0.003     22.361      0.000

Between Level

 Means
    GPA                2.599      0.022    120.047      0.000

 Variances
    GPA                0.063      0.007      8.661      0.000

1.2 Longitudinal modeling in long format – example 2: random intercept and random slope model

Our second model is a much more complicated model in which we allow random intercepts and random slopes of time on top of a more involved level-1 model where we add a new level-1 predictor variable, job. The random intercept is in turn a linear function of two between variables, sex and highgpa. Mathematically, here is our second model:

    gpa_it    = beta_0i + beta_1i*time + beta_2*job + e_it
    beta_0i = tau_0 + tau_1*highgpa + tau_2*sex + u_0i
    beta_1i = gamma_0 + u_1i
   (u_0i, u_1i)

Let’s look at the model statement to see what has been added. There are two lines on the within statement. The first, gpa on job, indicates that we want a random intercept and a fixed effect for job. On the next line, we request a random slope that will be predicted by time. For the between part of the model, we will use highgpa and sex as predictors. Because we have different predictor variables for the intercept and slope, we will not have any cross-level interactions. On the last line, we allow the intercept to be correlated with the slope.

Data:
  File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox.dat ;
Variable:
  Names are student highgpa gpa job admitted occas time sex;
  Missing are all (-9999) ; 
  usevariables are gpa time job student highgpa  sex;
  cluster = student;
  within = time job ;
  between = highgpa  sex;
analysis: type = twolevel random;
          estimator=ml; !default is mlr - maximum likelihood with robust standard errors
model: 
  %within%
  gpa on job;
  s | gpa on time;
  %between%
  gpa on highgpa sex;
  gpa with s;

Loglikelihood

          H0 Value                         -90.102

Information Criteria

          Number of Free Parameters              9
          Akaike (AIC)                     198.205
          Bayesian (BIC)                   244.016
          Sample-Size Adjusted BIC         215.428
            (n* = (n + 2) / 24)


MODEL RESULTS
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

Within Level

 GPA        ON
    JOB               -0.120      0.018     -6.684      0.000

 Residual Variances
    GPA                0.042      0.002     19.894      0.000

Between Level

 GPA        ON
    HIGHGPA            0.090      0.026      3.393      0.001
    SEX                0.117      0.032      3.641      0.000

 GPA      WITH
    S                 -0.003      0.002     -1.645      0.100

 Means
    S                  0.104      0.006     18.491      0.000

 Intercepts
    GPA                2.527      0.093     27.195      0.000

 Variances
    S                  0.004      0.001      6.060      0.000

 Residual Variances
    GPA                0.039      0.006      6.264      0.000

2.1 Longitudinal modeling in wide format, replicating example 1 in section 1.1

Now let’s run the same model in example 1 from previous section using the multivariate approach. To this end the data set will have to be restructured to wide format, and it looks like the following after being restructured. (The restructuring can be done in Mplus, but it may be easier to do in another package where you can see the data.) Click here for the data set.

    student   gpa0   gpa1   gpa2   gpa3   gpa4   gpa5   job0   job1   job2   job3   job4   job5   highgpa   sex  
          1    2.3    2.1      3      3      3    3.3      2      2      2      2      2      2       2.8     1  
          2    2.2    2.5    2.6    2.6      3    2.8      2      3      2      2      2      2       2.5     0  
          3    2.4    2.9      3    2.8    3.3    3.4      2      2      2      3      2      2       2.5     1

Let’s recall that the model that we are going to run is the following:

gpa_it = beta_0i + beta_1*time + e_it

In the multivariate approach, we are more explicit about the latent variables involved in the model. In this case, we have potentially two latent variables, the random intercept and the random slope for time. We name them i (for intercept) and s (for slope). We assume that the two latent variables i and s are normally distributed, and we are interested in estimating the mean and the standard deviation of these two variables. The fact that we don’t allow the slope to vary across subjects, or equivalently that the slope is fixed, leads us to fix the variance of s at zero. Another technical point is that in the multilevel approach, the residual variance is homogeneous across all the time points. To reproduce the results of example 1 here, we will also fix the residual variance to be the same across all the six time points.

Data:
  File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat ;
Variable:
  Names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5 
         highgpa job0 job1 job2 job3 job4 job5 sex;
  Missing are all (-9999) ; 
  usevariables are  gpa0 gpa1 gpa2 gpa3 gpa4 gpa5;
analysis: estimator = ml;
model: 
  i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5;
  s@0; !fix variance of the slope at zero
  gpa0 - gpa5 (1); !fix the residual variance to be same across time points

Before we run the model, let’s take a moment to review the model statement. This is very different from what we saw in the first section where we ran the model using the multilevel approach. As we have mentioned before, there are two latent variables being modeled, the intercept and the slope. In the growth curve modeling terminology, we call them intercept and slope growth factors. Mplus uses | to define these latent factors. The time scores for the slope are fixed using the symbol "@" at 0, 1, 2, 3, 4 and 5 since we are only modeling the linear growth. Implicitly, the intercept is defined to be the initial value of gpa at time = 0 since the time score for slope is set to be zero at time = 0. For more information, see the description of Example 8.1 in the Mplus User’s Guide. Now let’s take a look at the output.

*** WARNING in MODEL command
  All continuous latent variable covariances involving S have been fixed to 0
  because the variance of S is fixed at 0.
   1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    6
Number of independent variables                                  0
Number of continuous latent variables                            2

Observed dependent variables

  Continuous
   GPA0        GPA1        GPA2        GPA3        GPA4        GPA5

Continuous latent variables
   I           S

Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03

Input data file(s)
  https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat

Input data format  FREE

SUMMARY OF DATA

     Number of missing data patterns             1

COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value   0.100

     PROPORTION OF DATA PRESENT

           Covariance Coverage
              GPA0          GPA1          GPA2          GPA3          GPA4
              ________      ________      ________      ________      ________
 GPA0           1.000
 GPA1           1.000         1.000
 GPA2           1.000         1.000         1.000
 GPA3           1.000         1.000         1.000         1.000
 GPA4           1.000         1.000         1.000         1.000         1.000
 GPA5           1.000         1.000         1.000         1.000         1.000

           Covariance Coverage
              GPA5
              ________
 GPA5           1.000

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                            338.824
          Degrees of Freedom                    23
          P-Value                           0.0000

Chi-Square Test of Model Fit for the Baseline Model

          Value                            811.632
          Degrees of Freedom                    15
          P-Value                           0.0000

CFI/TLI

          CFI                                0.604
          TLI                                0.741

Loglikelihood

          H0 Value                        -196.825
          H1 Value                         -27.413

Information Criteria

          Number of Free Parameters              4
          Akaike (AIC)                     401.649
          Bayesian (BIC)                   414.842
          Sample-Size Adjusted BIC         402.170
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.262
          90 Percent C.I.                    0.238  0.287
          Probability RMSEA <= .05           0.000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.293

MODEL RESULTS
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 I        |
    GPA0               1.000      0.000    999.000    999.000
    GPA1               1.000      0.000    999.000    999.000
    GPA2               1.000      0.000    999.000    999.000
    GPA3               1.000      0.000    999.000    999.000
    GPA4               1.000      0.000    999.000    999.000
    GPA5               1.000      0.000    999.000    999.000

 S        |
    GPA0               0.000      0.000    999.000    999.000
    GPA1               1.000      0.000    999.000    999.000
    GPA2               2.000      0.000    999.000    999.000
    GPA3               3.000      0.000    999.000    999.000
    GPA4               4.000      0.000    999.000    999.000
    GPA5               5.000      0.000    999.000    999.000

 Means
    I                  2.599      0.022    120.047      0.000
    S                  0.106      0.004     26.109      0.000

 Intercepts
    GPA0               0.000      0.000    999.000    999.000
    GPA1               0.000      0.000    999.000    999.000
    GPA2               0.000      0.000    999.000    999.000
    GPA3               0.000      0.000    999.000    999.000
    GPA4               0.000      0.000    999.000    999.000
    GPA5               0.000      0.000    999.000    999.000

 Variances
    I                  0.063      0.007      8.661      0.000
    S                  0.000      0.000    999.000    999.000

 Residual Variances
    GPA0               0.058      0.003     22.361      0.000
    GPA1               0.058      0.003     22.361      0.000
    GPA2               0.058      0.003     22.361      0.000
    GPA3               0.058      0.003     22.361      0.000
    GPA4               0.058      0.003     22.361      0.000
    GPA5               0.058      0.003     22.361      0.000

2.2 Longitudinal modeling in wide format, example 1 revisited

Now let’s rerun our previous example, but relax our assumption regarding the residual variance. In this example, we allow the residual variance across the six time points to be different from each other. To this end, we just simply comment out last line of the code in which we constrain the residual variance to be the same. The point is that this approach in wide format gives us more flexibility in modeling besides a different angle to look at the same model.

  Data:
    File is https://stats.idre.ucla.edu/wp-content/uploads/2016/02/gpa_ch5_hox_wide.dat ;
  Variable:
    Names are student admitted gpa0 gpa1 gpa2 gpa3 gpa4 gpa5
           highgpa job0 job1 job2 job3 job4 job5 sex;
    Missing are all (-9999) ;
    usevariables are  gpa0 gpa1 gpa2 gpa3 gpa4 gpa5;
  analysis: estimator = ml;
  model:
    i s | gpa0@0 gpa1@1 gpa2@2 gpa3@3 gpa4@4 gpa5@5;
    s@0;
    !gpa0 - gpa5 (1);

TESTS OF MODEL FIT
Chi-Square Test of Model Fit

          Value                            186.267
          Degrees of Freedom                    18
          P-Value                           0.0000

Chi-Square Test of Model Fit for the Baseline Model

          Value                            811.632
          Degrees of Freedom                    15
          P-Value                           0.0000

CFI/TLI

          CFI                                0.789
          TLI                                0.824

Loglikelihood

          H0 Value                        -120.546
          H1 Value                         -27.413

Information Criteria

          Number of Free Parameters              9
          Akaike (AIC)                     259.093
          Bayesian (BIC)                   288.778
          Sample-Size Adjusted BIC         260.265
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.216
          90 Percent C.I.                    0.189  0.245
          Probability RMSEA <= .05           0.000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.782



MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 I        |
    GPA0               1.000      0.000    999.000    999.000
    GPA1               1.000      0.000    999.000    999.000
    GPA2               1.000      0.000    999.000    999.000
    GPA3               1.000      0.000    999.000    999.000
    GPA4               1.000      0.000    999.000    999.000
    GPA5               1.000      0.000    999.000    999.000

 S        |
    GPA0               0.000      0.000    999.000    999.000
    GPA1               1.000      0.000    999.000    999.000
    GPA2               2.000      0.000    999.000    999.000
    GPA3               3.000      0.000    999.000    999.000
    GPA4               4.000      0.000    999.000    999.000
    GPA5               5.000      0.000    999.000    999.000

 Means
    I                  2.599      0.026     99.187      0.000
    S                  0.106      0.004     26.313      0.000

 Intercepts
    GPA0               0.000      0.000    999.000    999.000
    GPA1               0.000      0.000    999.000    999.000
    GPA2               0.000      0.000    999.000    999.000
    GPA3               0.000      0.000    999.000    999.000
    GPA4               0.000      0.000    999.000    999.000
    GPA5               0.000      0.000    999.000    999.000

 Variances
    I                  0.093      0.010      9.226      0.000
    S                  0.000      0.000    999.000    999.000

 Residual Variances
    GPA0               0.138      0.015      9.475      0.000
    GPA1               0.094      0.010      9.299      0.000
    GPA2               0.054      0.006      8.925      0.000
    GPA3               0.026      0.003      7.834      0.000
    GPA4               0.017      0.003      6.547      0.000
    GPA5               0.026      0.003      7.704      0.000