This seminar is a continuation of our Introduction to Mplus seminar. We will review the basics of Mplus syntax and show some examples for simple analyses, such as regression models for continuous and binary variables. Then we’ll move on to more advanced models, such as factor analysis, path analysis, growth curve models and latent class models. Some of the examples will be demonstrated by running Mplus in real time. The data files and the input files are zipped for an easy download and can be accessed by following the link.
Introduction
“We started to develop Mplus eleven years ago with the goal of providing applied researchers with powerful new statistical modeling techniques. We saw a wide gap between new statistical methods presented in the statistical literature and the statistical methods used by researchers in applied papers. Our goal was to help bridge this gap with easy-to-use but powerful software.” — February 2006, Preface to the Mplus User’s Guide.
Mplus has been very successful in achieving their goal and has been improving constantly ever since it was first released in 1998. Its general framework of continuous and categorical latent variables gives us a new framework to formulate statistical models. For example, not only we can perform growth curve analysis, but also latent class growth analysis; not only we can do discrete-time survival analysis, but also discrete-time survival mixture analysis. The possibilities of different ways of modeling make Mplus a very attractive piece of software. It offers several options to deal with the missing data issue, including maximum likelihood estimation and estimation based on the multiple imputed data sets.
Over the years, we have recommended to our clients the “get in and get out” approach with Mplus (and some other statistical packages) and it seems to us that this approach has worked well. This approach consists of a few steps: deciding the appropriate models for the study; deciding if switching to Mplus is necessary; preparing the data structure for Mplus using a familiar software package; and moving to Mplus and performing the analyses.
Our goal for this seminar is to help the transition process to Mplus. We will discuss the overall structure and syntax of Mplus input files. We will also discuss the usage of the Mplus 4 User’s Guide and the online resources for Mplus. Starting with some basic models, we will transit to some more advanced models.
Overall structure of Mplus input file
An input file defines the data set to use and the model to run. It is similar to a SAS program file, an SPSS syntax file and a Stata .do file. Below is an example of an input file. It is here to show the general structure of an input file. We are not going to explain what analysis it does.
Data: File is d:workdatarawtable3_4.dat ; Variable: names are a b c d freq; missing are all (-9999) ; usevariables are a b c d; weight is freq ; !default is frequency weight categorical are a b c d; classes = cl(2); Analysis: Type = mixture ; starts = 0; Model: %overall% [a$1*10 b$1*10 c$1*10 d$1*10] (1); %cl#1% [a$1*-10 b$1*-10 c$1*-10 d$1*-10] (2); plot: type= plot3; series is a(1) b(2) c(3) d(4);
Here are some characteristics of an input file:
- an input line can not exceed 80 characters in width;
- variable names can not exceed 8 characters in length;
- only one model per input file;
- only one output file per input file;
- comments start with “!”;
- the default of the analysis type is type = general;
- the keywords categorical and count are for outcome variables only;
- new variables can be defined using the” define” command.
Here are some characteristics of a data file:
- must be in ASCII format;
- can be in fixed format or delimited;
- can be raw data or correlation data;
- no variable names in the first line;
- only numeric variables are allowed;
- use stata2mplus to convert a Stata data file to an ASCII data file and an Mplus input file.
Overall review of Mplus syntax for the model command
Mplus has made a great effort to make the syntax as simple as possible. Since there are so many analyses that Mplus can perform, the model command can still get really involved. We have compiled a short list here for commonly used keywords.
- “on” for regression (regress response variable “on” predictor variables);
- “by” for factors (measured “by” observed variables);
- “with” for covariance (correlated “with”);
- “[x]” for means or intercepts;
- “x” alone means the variance of x;
- “*” for starting values;
- “@” for fixed values;
- “|” for random effects;
- use (_number_) to constrain a set of parameters to be equal.
Use of User’s Guide and online resources
The Mplus User’s Guide is an excellent reference both for Mplus syntax and for types of models possible in Mplus. It has the flavor of learning by doing. Its organization is very different from other user guides, such as that of Stata, SAS or SPSS. Examples for basic models can be found in the first chapter, and more advanced models are divided into later chapters. The section on syntax is near the end. A very important feature is that almost all of the examples in the Guide are included with the software itself. If one sees an interesting example, one can always run the model to see the output and to modify the example to suit one’s own modeling need. An equally important feature is that each example in the book has a counterpart of Monte Carlo simulation. In fact, the Monte Carlo simulation has been used for generating most of the data sets used in the User’s Guide. The help system of Mplus has A SUMMARY OF THE Mplus LANGUAGE for a quick reference.
The Mplus website has tremendous resources, with a very active discussion group on many topics for serious modelers and the website has many examples one can download. One can get access to the entire User’s Guide in PDF format from Mplus’ website. One can search the entire Mplus User’s Guide for examples and commands. It is a great place to learn new modeling possibilities and to learn Mplus language as well.
Post estimation
Mplus has three commands for post estimation. The output command, the savedata command and the plot command. The output command is used for requesting types of output to be included in the output file. For example, we can request sample statistics to be displayed by using the option sampstat in the output command. The savedata command is used for creating an ASCII data file for further data analysis. The plot command is needed for requesting plots. Mplus offers many model related plots and the controls over the plots are easy to use.
Simple examples
We will review how some simple models are done in Mplus. We will start with linear regression and then discuss models with binary outcomes.
Example 1. Where is the output for intercept? (linear regression)
The code below is for a simple linear regression with the dependent variable write regressed on the predictor variables female and read. So we use the keyword on in the model statement.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female write read; Model: write on female read; MODEL RESULTS
Estimates S.E. Est./S.E.
WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546
Residual Variances WRITE 50.113 5.011 10.000
Notice that something is missing in the output. Yes, the intercept is missing. What does it mean? Be default, Mplus performs an analysis of covariance. To understand what it is doing, let’s perform this analysis manually in the fashion of covariance analysis. We create the covariance matrix for the variables write, female and read, and use this covariance matrix as the input for our analysis.
Title: example of using covariance matrix. input data is an matrix: 89.8436 1.21369 .249221 57.9967 -.271709 105.123; Data: file is cov.dat; type is covariance; nobservations = 200; Variable: names are write female read; Model: write on female read; Estimates S.E. Est./S.E. WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546 Residual Variances WRITE 50.113 5.011 10.000
That shows that the analysis we did at the beginning of this example is just an analysis of covariance. In order to estimate the intercept, which is the expected mean holding values of predictor variables at zero, we need to tell Mplus that we are also interested in the analysis of means. This can be done easily by adding type = meanstructure to the analysis command. Every model has an analysis command associated with it. In this example, we don’t see the analysis command because we are using the default setting. The default setting is analysis: type = general. Models that can be estimated using type=general include regression analysis, path analysis, confirmatory factor analysis, structural equation modeling and growth curve modeling. Within any specific analysis setting, we can add more options, such as type = missing when the data set has missing values, and we don’t want to do listwise deletion. Or we can add type =meanstructure to have the mean or intercept displayed in the output window as we are going to do here.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female write read; Analysis: type=meanstructure; Model: write on female read;
Estimates S.E. Est./S.E.
WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546
Intercepts WRITE 20.228 2.693 7.511
Residual Variances WRITE 50.113 5.011 10.000
Example 2. Is it a probit or a logit regression? (binary outcome)
Now let’s switch to binary outcomes. Using the same data set as in previous example, we create a new dichotomous variable called hon based on the variable write. We also declare that the new variable hon is a categorical variable. As we have mentioned before, the keyword categorical is for outcome variables only. If we have categorical variables as predictors, we have to make sure the dummy variables have been created for them (usually in another software package before the data are moved into Mplus).
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female math read hon; categorical is hon; define: hon = (write>60); Model: hon on female math read;
Observed dependent variables
Binary and ordered categorical (ordinal) HON
Observed independent variables FEMALE MATH READ
Estimator WLSMV Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Parameterization DELTA
Input data file(s) hsb2.dat
Input data format FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
HON Category 1 0.755 Category 2 0.245
(output omitted...) MODEL RESULTS
Estimates S.E. Est./S.E.
HON ON FEMALE 0.574 0.246 2.335 MATH 0.069 0.016 4.324 READ 0.038 0.017 2.275
R-SQUARE
Observed Residual Variable Variance R-Square
HON 1.000 0.489
Now, is this a probit model or a logit model? Mplus is not very explicit about it. By default, it is a probit model. In case we don’t know the default, we can still tell that this is a probit model since it has an output section on R-square with residual variance of 1. This is what probit models assume. It assumes that the residual variance follows the standard normal distribution. Now did we miss something again? Yes. We don’t see the intercept. This is the exact same situation as we had with the linear regression. Adding type=meanstructure will give us the intercept, which Mplus calls “threshold”.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; Usevariables are female math read hon; Categorical is hon; Define: hon = (write>60); Analysis: type=meanstructure; Model: hon on female math read; MODEL RESULTS Estimates S.E. Est./S.E. HON ON FEMALE 0.574 0.246 2.335 MATH 0.069 0.016 4.324 READ 0.038 0.017 2.275 Thresholds HON$1 6.887 1.063 6.482 R-SQUARE Observed Residual Variable Variance R-Square HON 1.000 0.489
What about a logistic regression with the same data? To do a logistic regression, we will change the estimation method from the default method of WLSMV to ML.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; Usevariables are female math read hon; Categorical is hon; Define: hon = (write>60); Analysis: estimator = ml; Model: hon on female math read;
Estimator ML (output omitted...) Link LOGIT Cholesky OFF
Input data file(s) hsb2.dat Input data format FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
HON Category 1 0.755 Category 2 0.245
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Loglikelihood
H0 Value -78.085
Information Criteria
Number of Free Parameters 4 Akaike (AIC) 164.170 Bayesian (BIC) 177.363 Sample-Size Adjusted BIC 164.690 (n* = (n + 2) / 24)
MODEL RESULTS
Estimates S.E. Est./S.E.
HON ON FEMALE 0.980 0.422 2.324 MATH 0.123 0.031 3.931 READ 0.059 0.027 2.224
Thresholds HON$1 11.770 1.711 6.880
LOGISTIC REGRESSION ODDS RATIO RESULTS
HON ON FEMALE 2.664 MATH 1.131 READ 1.061
Advanced examples
Example 1. Exploratory factor analysis
Exploratory factor analysis has often been used to explore the variable structures. But most statistical software lacks the sophisticated techniques to deal with the missing value issue or binary variables. On the other hand, Mplus allows us to take care of both issues. Let’s start with a simple exploratory factor analysis. This example is taken from our Annotated SPSS Output Factor Analysis page. The data set has many variables, and we are only going to use item13 – item24, as they are all about instructors.
Data: File is factor.dat ; Variable: Names are facsex facethn facnat facrank employm salary yrsteach yrsut degree sample remind nstud studrank studsex grade gpa satisfy religion psd item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 item25 item26 item27 item28 item29 item30 item31 item32 item33 item34 item35 item36 item37 item38 item39 item40 item41 item42 item43 item44 item45 item46 item47 item48 item49 item50 item51 item52 race sexism racism rpolicy casteman competen sensitiv cstatus; Missing are all (-9999) ; Usevariables are item13 - item24; Analysis: estimator = ml; Type = efa 1 3 ;
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups 1 Number of observations 1428
Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0
Observed dependent variables
Continuous ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ITEM23 ITEM24
Estimator ML Information matrix EXPECTED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20
Input data file(s) factor.dat
Input data format FREE
RESULTS FOR EXPLORATORY FACTOR ANALYSIS
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 6.073 1.223 0.735 0.648 0.572
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9 10 ________ ________ ________ ________ ________ 1 0.539 0.485 0.429 0.383 0.334
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.311 0.267
(output omitted...)
EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
CHI-SQUARE VALUE 147.541 DEGREES OF FREEDOM 33 PROBABILITY VALUE 0.0000
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE (90 PERCENT C.I.) IS 0.049 ( 0.041 0.058) PROBABILITY RMSEA LE 0.05 IS 0.540
ROOT MEAN SQUARE RESIDUAL IS 0.0175
VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.744 0.158 0.236 ITEM14 0.753 0.197 0.213 ITEM15 0.650 0.303 0.258 ITEM16 0.581 0.292 0.177 ITEM17 0.532 0.468 0.300 ITEM18 0.277 0.731 0.240 ITEM19 0.158 0.745 0.130 ITEM20 0.243 0.470 0.187 ITEM21 0.350 0.504 0.383 ITEM22 0.189 0.531 0.319 ITEM23 0.409 0.365 0.724 ITEM24 0.321 0.309 0.604
PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.820 -0.098 0.050 ITEM14 0.828 -0.037 0.001 ITEM15 0.645 0.110 0.063 ITEM16 0.591 0.152 -0.029 ITEM17 0.424 0.342 0.105 ITEM18 0.035 0.790 0.009 ITEM19 -0.079 0.890 -0.116 ITEM20 0.093 0.475 0.032 ITEM21 0.144 0.402 0.268 ITEM22 -0.048 0.510 0.218 ITEM23 0.128 0.044 0.786 ITEM24 0.079 0.048 0.662
PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.611 1.000 3 0.658 0.685 1.000
ESTIMATED RESIDUAL VARIANCES ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ 1 0.367 0.349 0.418 0.545 0.408
ESTIMATED RESIDUAL VARIANCES ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ 1 0.331 0.404 0.685 0.477 0.581
ESTIMATED RESIDUAL VARIANCES ITEM23 ITEM24 ________ ________ 1 0.176 0.436
Example 2. Exploratory factor analysis with binary variables
For the purpose of illustration, we dichotomized the variables item13-item24 from the previous example. We will do the same exploratory factor analysis again, but with the binary variables. Factor analysis with binary variables uses the tetrachoric correlation structure. It requires much larger sample size than the case for continuous variables.
Data: File is cat_factor.dat ; Variable: Names are item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 cat_13 - cat_24; Missing are all (-9999) ; usevariables are cat_13 - cat_24; categorical are cat_13 - cat_24; Analysis: Type = efa 1 3 ; SUMMARY OF ANALYSIS
Number of groups 1 Number of observations 1428
Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0
Observed dependent variables
Binary and ordered categorical (ordinal) CAT_13 CAT_14 CAT_15 CAT_16 CAT_17 CAT_18 CAT_19 CAT_20 CAT_21 CAT_22 CAT_23 CAT_24
Estimator ULS Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20
(output omitted...) RESULTS FOR EXPLORATORY FACTOR ANALYSIS
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 7.208 1.280 0.768 0.622 0.451
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9
________ ________ ________ ________ ________ 1 0.424 0.374 0.259 0.180 0.174
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.157 0.104
(output omitted...)
EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
ROOT MEAN SQUARE RESIDUAL IS 0.0199
VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ CAT_13 0.813 0.260 0.297 CAT_14 0.806 0.228 0.306 CAT_15 0.824 0.293 0.262 CAT_16 0.758 0.307 0.141 CAT_17 0.724 0.502 0.226 CAT_18 0.254 0.794 0.241 CAT_19 0.363 0.728 0.154 CAT_20 0.223 0.484 0.153 CAT_21 0.271 0.574 0.414 CAT_22 0.177 0.592 0.320 CAT_23 0.412 0.413 0.812 CAT_24 0.337 0.368 0.613
PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ CAT_13 0.832 -0.005 0.120 CAT_14 0.832 -0.049 0.144 CAT_15 0.844 0.050 0.062 CAT_16 0.791 0.136 -0.087 CAT_17 0.657 0.369 -0.028 CAT_18 -0.032 0.882 0.008 CAT_19 0.151 0.799 -0.112 CAT_20 0.064 0.516 -0.003 CAT_21 0.017 0.517 0.300 CAT_22 -0.079 0.604 0.193 CAT_23 0.137 0.102 0.842 CAT_24 0.115 0.144 0.612
PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.606 1.000 3 0.574 0.645 1.000
ESTIMATED RESIDUAL VARIANCES CAT_13 CAT_14 CAT_15 CAT_16 CAT_17 ________ ________ ________ ________ ________ 1 0.183 0.205 0.166 0.312 0.172
ESTIMATED RESIDUAL VARIANCES CAT_18 CAT_19 CAT_20 CAT_21 CAT_22 ________ ________ ________ ________ ________ 1 0.247 0.315 0.693 0.426 0.516
ESTIMATED RESIDUAL VARIANCES CAT_23 CAT_24 ________ ________ 1 0.001 0.376
Example 3. Exploratory factor analysis on continuous outcome variables with missing data
For the purpose of illustration again, we have created another version of the data set. This data set is basely on the data set in Example 1 in the section of Advanced Examples. We have created a lot of missing values, and the pattern of missing is completely random. For the same analysis, we will add the type = missing option to tell Mplus that the analysis will be done without deleting any cases. In general, Mplus offers ML estimation under the assumption of MCAR and MAR. From the output labeled as “PROPORTION OF DATA PRESENT”, we can see that many variables have a good amount of missing data.
Data: File is factor_missing.dat ; Variable: Names are item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24; Missing are all (-9999) ; Analysis: Type = efa 1 3 missing;
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups 1 Number of observations 1428
Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0
Observed dependent variables
Continuous ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ITEM23 ITEM24
Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20
Input data file(s) factor_missing.dat
Input data format FREE
SUMMARY OF DATA
Number of patterns 940
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value 0.100
PROPORTION OF DATA PRESENT
Covariance Coverage ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ ITEM13 0.492 ITEM14 0.209 0.436 ITEM15 0.216 0.183 0.433 ITEM16 0.266 0.235 0.225 0.513 ITEM17 0.277 0.235 0.227 0.280 0.544 ITEM18 0.257 0.228 0.237 0.264 0.282 ITEM19 0.245 0.218 0.218 0.263 0.275 ITEM20 0.271 0.232 0.214 0.288 0.293 ITEM21 0.305 0.277 0.272 0.319 0.343 ITEM22 0.349 0.298 0.305 0.370 0.379 ITEM23 0.422 0.377 0.371 0.443 0.477 ITEM24 0.410 0.368 0.368 0.436 0.466
Covariance Coverage ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ ITEM18 0.520 ITEM19 0.258 0.508 ITEM20 0.272 0.272 0.533 ITEM21 0.327 0.318 0.333 0.625 ITEM22 0.370 0.361 0.382 0.438 0.704 ITEM23 0.449 0.440 0.453 0.543 0.606 ITEM24 0.431 0.428 0.451 0.539 0.590
Covariance Coverage ITEM23 ITEM24 ________ ________ ITEM23 0.867 ITEM24 0.732 0.848
RESULTS FOR EXPLORATORY FACTOR ANALYSIS
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 6.043 1.257 0.736 0.658 0.627
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9 10 ________ ________ ________ ________ ________ 1 0.551 0.454 0.439 0.422 0.331
EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.267 0.213
(output omitted...)
EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
CHI-SQUARE VALUE 90.822 DEGREES OF FREEDOM 33 PROBABILITY VALUE 0.0000
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE (90 PERCENT C.I.) IS 0.035 ( 0.027 0.044) PROBABILITY RMSEA LE 0.05 IS 0.998
ROOT MEAN SQUARE RESIDUAL IS 0.0286
VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.789 0.168 0.151 ITEM14 0.742 0.216 0.176 ITEM15 0.598 0.347 0.312 ITEM16 0.549 0.176 0.345 ITEM17 0.535 0.264 0.483 ITEM18 0.233 0.231 0.750 ITEM19 0.142 0.183 0.700 ITEM20 0.278 0.151 0.510 ITEM21 0.337 0.362 0.546 ITEM22 0.169 0.310 0.520 ITEM23 0.358 0.768 0.392 ITEM24 0.315 0.554 0.348
PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.897 -0.041 -0.091 ITEM14 0.812 0.031 -0.066 ITEM15 0.540 0.203 0.101 ITEM16 0.526 -0.034 0.235 ITEM17 0.433 0.041 0.388 ITEM18 -0.025 -0.021 0.848 ITEM19 -0.109 -0.043 0.827 ITEM20 0.137 -0.055 0.546 ITEM21 0.127 0.210 0.484 ITEM22 -0.061 0.194 0.519 ITEM23 0.062 0.829 0.089 ITEM24 0.096 0.558 0.137
PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.628 1.000 3 0.614 0.686 1.000
ESTIMATED RESIDUAL VARIANCES ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ 1 0.327 0.373 0.424 0.549 0.411
ESTIMATED RESIDUAL VARIANCES ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ 1 0.329 0.456 0.639 0.457 0.605
ESTIMATED RESIDUAL VARIANCES ITEM23 ITEM24 ________ ________ 1 0.128 0.473
Example 4. Path analysis with indirect and direct effects
We have created a fake data set on school performance. We hypothesize that school performance will be related to student’s IQ, ambition and social economic status. On the other hand, student’s IQ might be also related to ses. Here is the diagram for our hypothesis:
Mplus offers a very straightforward way to display all the possible direct and indirect effects by using the model indirect statement.
Data: File is path_anlaysis.dat ; Variable: Names are pfrm ses ambition iq; Missing are all (-9999) ; Model: pfrm on iq ambition ses; iq on ses; Model indirect: pfrm ind ses; TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 0.060 Degrees of Freedom 1 P-Value 0.8066 Chi-Square Test of Model Fit for the Baseline Model Value 135.440 Degrees of Freedom 5 P-Value 0.0000 CFI/TLI CFI 1.000 TLI 1.036 Loglikelihood H0 Value -1775.747 H1 Value -1775.717 Information Criteria Number of Free Parameters 6 Akaike (AIC) 3563.494 Bayesian (BIC) 3583.283 Sample-Size Adjusted BIC 3564.275 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.117 Probability RMSEA <= .05 0.849 SRMR (Standardized Root Mean Square Residual) Value 0.006 MODEL RESULTS Estimates S.E. Est./S.E. PFRM ON IQ 0.547 0.051 10.728 AMBITION 5.635 1.009 5.584 SES 0.930 0.727 1.279 IQ ON SES 4.152 0.957 4.339 Residual Variances PFRM 49.706 4.971 10.000 IQ 95.599 9.560 10.000 TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS Estimates S.E. Est./S.E. Effects from SES to PFRM Total 3.201 0.870 3.677 Total indirect 2.271 0.565 4.022 Specific indirect PFRM IQ SES 2.271 0.565 4.022 Direct PFRM SES 0.930 0.727 1.279
Example 5. Growth curve modeling with the long format approach
We have chosen a simple example to show how Mplus can handle growth curve modeling. Unlike most statistical software, Mplus does growth curve modeling in both long and wide format. The two approaches offer different ways of looking at the same model and offer alternative models to one another. The example here is taken from Chapter 7 of Singer and Willett’s Applied Longitudinal Data Analysis. The outcome variable is the response time on a timed cognitive task called “opposites naming”. It is measured at four time points. We will start with the long format approach. This means that each subject will have potentially four rows of observations on the dependent variable and other covariates. In other words, this is the univariate approach. This is also the standard hierarchical linear model approach.
Data: File is opposites_pp.dat; Variable: Names are id time opp cog ccog wave; Missing are all (-9999) ; Usevariables are time opp ccog; Cluster = id; Within are time ; Between are ccog; Analysis: type = random twolevel; Model: %within% s | opp on time; %between% opp s on ccog; opp with s; SUMMARY OF ANALYSIS Number of groups 1 Number of observations 140 Number of dependent variables 1 Number of independent variables 2 Number of continuous latent variables 1 Observed dependent variables Continuous OPP Observed independent variables TIME CCOG Continuous latent variables S Variables with special functions Cluster variable ID Within variables TIME Between variables CCOG Estimator MLR Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.100D-05 Maximum number of EM iterations 500 Convergence criteria for the EM algorithm Loglikelihood change 0.100D-02 Relative loglikelihood change 0.100D-05 Derivative 0.100D-02 Minimum variance 0.100D-03 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Optimization algorithm EMA Input data file(s) opposites_pp.dat Input data format FREE SUMMARY OF DATA Number of clusters 35 Size (s) Cluster ID with Size s 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Average cluster size 4.000 Estimated Intraclass Correlations for the Y Variables Intraclass Intraclass Variable Correlation Variable Correlation OPP 0.406 THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Loglikelihood H0 Value -633.451 H0 Scaling Correction Factor 0.793 for MLR Information Criteria Number of Free Parameters 8 Akaike (AIC) 1282.901 Bayesian (BIC) 1306.434 Sample-Size Adjusted BIC 1281.123 (n* = (n + 2) / 24) MODEL RESULTS Estimates S.E. Est./S.E. Within Level Residual Variances OPP 159.727 23.491 6.800 Between Level S ON CCOG 0.433 0.121 3.566 OPP ON CCOG -0.114 0.416 -0.274 OPP WITH S -165.185 67.783 -2.437 Intercepts OPP 164.384 6.024 27.286 S 26.954 1.936 13.923 Residual Variances OPP 1158.985 278.161 4.167 S 99.238 23.369 4.247
Example 6a. Growth curve modeling with the wide format approach
Now let’s move to growth curve modeling with a wide format approach. The data structure is now in wide format. That is each subject will only have one row of data, with four dependent variables corresponding to the four time points. In other words, this is the multivariate approach. To this end, we have to restructure the data from long to wide (in another statistical package). In order to match the results from the long format approach, we have to constrain the residual variance at each time point to be equal to each other. This also gives us a hint that the residual variances don’t have to be always equal, leading to more flexible models.
Data: File is opposites_wide.dat ; Variable: Names are id opp1 opp2 opp3 opp4 cog ccog; Missing are all (-9999) ; usev = opp1-opp4 ccog; Analysis: Type = meanstructure; Model: i s | opp1@0 opp2@1 opp3@2 opp4@3; i s on ccog; [i s];
[opp1-opp4@0]; ! constraining the mean to be zero at all time points. opp1 - opp4 (1); ! constraining the residual variance to be equal ! at all time points.
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups 1 Number of observations 35
Number of dependent variables 4 Number of independent variables 1 Number of continuous latent variables 2
Observed dependent variables
Continuous OPP1 OPP2 OPP3 OPP4
Observed independent variables CCOG
Continuous latent variables I S
Estimator ML Information matrix EXPECTED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20
Input data file(s) opposites_wide.dat
Input data format FREE
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value 6.899 Degrees of Freedom 10 P-Value 0.7350
Chi-Square Test of Model Fit for the Baseline Model
Value 134.996 Degrees of Freedom 10 P-Value 0.0000
CFI/TLI
CFI 1.000 TLI 1.025
Loglikelihood
H0 Value -770.987 H1 Value -767.538
Information Criteria
Number of Free Parameters 8 Akaike (AIC) 1557.975 Bayesian (BIC) 1570.418 Sample-Size Adjusted BIC 1545.438 (n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate 0.000 90 Percent C.I. 0.000 0.134 Probability RMSEA <= .05 0.787
SRMR (Standardized Root Mean Square Residual)
Value 0.043
MODEL RESULTS
Estimates S.E. Est./S.E.
I | OPP1 1.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 1.000 0.000 0.000 OPP4 1.000 0.000 0.000
S | OPP1 0.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 2.000 0.000 0.000 OPP4 3.000 0.000 0.000
I ON CCOG -0.114 0.489 -0.232
S ON CCOG 0.433 0.157 2.753
S WITH I -165.303 78.279 -2.112
Intercepts OPP1 0.000 0.000 0.000 OPP2 0.000 0.000 0.000 OPP3 0.000 0.000 0.000 OPP4 0.000 0.000 0.000 I 164.374 6.026 27.277 S 26.960 1.936 13.925
Residual Variances OPP1 159.475 26.956 5.916 OPP2 159.475 26.956 5.916 OPP3 159.475 26.956 5.916 OPP4 159.475 26.956 5.916 I 1159.354 304.409 3.809 S 99.298 31.821 3.121
Example 6b. Growth curve modeling with the wide format approach (different parameterization)
As we have assumed in the previous models, the random intercept and the random slope are always correlated with each other. With the wide format approach, we can also model the correlation in the way of regression. This basically reparameterizes the model. But now we can describe the relationship between the intercept and the slope in terms of changes.
Data: File is opposites_wide.dat ; Variable: Names are id opp1 opp2 opp3 opp4 cog ccog; Missing are all (-9999) ; usev = opp1-opp4 ccog; Analysis: Type = meanstructure; Model: i s | opp1@0 opp2@1 opp3@2 opp4@3; i s on ccog; [i s]; s on i; ! different parameterization happens here [opp1-opp4@0]; ! constraining the mean to be zero at all time points. opp1 - opp4 (1); ! constraining the residual variance to be equal ! at all time points. TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 6.899 Degrees of Freedom 10 P-Value 0.7350 Chi-Square Test of Model Fit for the Baseline Model Value 134.996 Degrees of Freedom 10 P-Value 0.0000 CFI/TLI CFI 1.000 TLI 1.025 Loglikelihood H0 Value -770.987 H1 Value -767.538 Information Criteria Number of Free Parameters 8 Akaike (AIC) 1557.975 Bayesian (BIC) 1570.418 Sample-Size Adjusted BIC 1545.438 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.134 Probability RMSEA <= .05 0.787 SRMR (Standardized Root Mean Square Residual) Value 0.043 MODEL RESULTS Estimates S.E. Est./S.E. I | OPP1 1.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 1.000 0.000 0.000 OPP4 1.000 0.000 0.000 S | OPP1 0.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 2.000 0.000 0.000 OPP4 3.000 0.000 0.000 S ON I -0.143 0.051 -2.773 I ON CCOG -0.114 0.489 -0.232 S ON CCOG 0.417 0.135 3.091 Intercepts OPP1 0.000 0.000 0.000 OPP2 0.000 0.000 0.000 OPP3 0.000 0.000 0.000 OPP4 0.000 0.000 0.000 I 164.374 6.026 27.277 S 50.398 8.613 5.852 Residual Variances OPP1 159.477 26.957 5.916 OPP2 159.477 26.957 5.916 OPP3 159.477 26.957 5.916 OPP4 159.477 26.957 5.916 I 1159.380 304.416 3.809 S 75.726 23.268 3.255
Example 7a. Latent class analysis
This example uses the hsb2 data set. We have test scores for the students in the sample and demographic variables as well. We want to see if we can classify students based on their test scores and how the class membership relates to other variables. This example is strictly for the purpose of illustration and therefore does not reflect any real theory or such. Notice that we have taken the default syntax to perform this analysis. We are looking for a two latent classes solution based on the scores on read, write, math, science and social studies (socst). The class membership is then regressed on the variables female and ses. Our model runs “successfully”. But Mplus gives us warning messages. It tells that the assumption that Mplus makes by default is that all the variables are uncorrelated within each latent class. Can we accept this assumption? Maybe not. But for the time being, let’s take a look at the rest of the output. We have the average scores for each of the two latent classes. We can tell that the first class has lower means on all the variables and the second one has higher means. These two classes make sense to us. Also, the class membership is highly related to ses.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Usevariables are read write math science socst female ses; classes = grp(2); Analysis: type=mixture; Model: %overall% grp#1 on female ses; *** WARNING in Model command Variable is uncorrelated with all other variables within class: READ *** WARNING in Model command Variable is uncorrelated with all other variables within class: WRITE *** WARNING in Model command Variable is uncorrelated with all other variables within class: MATH *** WARNING in Model command Variable is uncorrelated with all other variables within class: SCIENCE *** WARNING in Model command Variable is uncorrelated with all other variables within class: SOCST *** WARNING in Model command All least one variable is uncorrelated with all other variables within class. Check that this is what is intended. 6 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS Latent Class Analysis with Graphs SUMMARY OF ANALYSIS Number of groups 1 Number of observations 200 Number of dependent variables 5 Number of independent variables 2 Number of continuous latent variables 0 Number of categorical latent variables 1 Observed dependent variables Continuous READ WRITE MATH SCIENCE SOCST Observed independent variables FEMALE SES Categorical latent variables GRP Estimator MLR
(output omitted...) TESTS OF MODEL FIT Loglikelihood H0 Value -3510.499 H0 Scaling Correction Factor 1.126 for MLR Information Criteria Number of Free Parameters 18 Akaike (AIC) 7056.999 Bayesian (BIC) 7116.369 Sample-Size Adjusted BIC 7059.343 (n* = (n + 2) / 24) Entropy 0.852 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 96.61160 0.48306 2 103.38840 0.51694 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 96.61161 0.48306 2 103.38839 0.51694 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 95 0.47500 2 105 0.52500 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 1 0.963 0.037 2 0.049 0.951 MODEL RESULTS Estimates S.E. Est./S.E. Latent Class 1 Means READ 44.645 1.107 40.336 WRITE 45.822 1.197 38.269 MATH 45.766 0.806 56.784 SCIENCE 45.189 1.405 32.153 SOCST 45.785 1.375 33.288 Variances READ 50.830 5.261 9.662 WRITE 44.222 5.109 8.656 MATH 43.108 4.842 8.903 SCIENCE 56.073 7.406 7.572 SOCST 73.733 7.395 9.970 Latent Class 2 Means READ 59.318 1.168 50.791 WRITE 59.272 0.913 64.939 MATH 59.073 1.256 47.018 SCIENCE 58.075 0.836 69.495 SOCST 58.591 1.041 56.288 Variances READ 50.830 5.261 9.662 WRITE 44.222 5.109 8.656 MATH 43.108 4.842 8.903 SCIENCE 56.073 7.406 7.572 SOCST 73.733 7.395 9.970 Categorical Latent Variables GRP#1 ON FEMALE -0.173 0.344 -0.502 SES -0.779 0.222 -3.506 Intercepts GRP#1 1.622 0.556 2.917 LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables GRP#1 ON FEMALE 0.841 SES 0.459 ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION Parameterization using Reference Class 1 GRP#2 ON FEMALE 0.173 0.344 0.502 SES 0.779 0.222 3.506 Intercepts GRP#2 -1.622 0.556 -2.917
Example 7b. Latent class analysis with graphics
Now, let’s take up the issue of the correlation of variables within latent classes. We will also request some plots. Should we allow all the test scores to be correlated with each other? Maybe not. In this example, we allow reading scores to be correlated with all the other test scores, writing scores to be correlated with social studies scores, and math scores to be correlated with the science scores. We can take a look at the difference in AIC values and conclude that this is a better fitting model than the previous one.
Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Usevariables are read write math science socst female ses; classes = grp(2); Analysis: type=mixture; Model: %overall% read with write; read with math; read with science; read with socst; write with socst; math with science; grp#1 on female ses; Plot: type is plot3; series is read (1) write (2) math (3) science (4) socst (5); (output omitted...) TESTS OF MODEL FIT Loglikelihood H0 Value -3455.156 H0 Scaling Correction Factor 1.068 for MLR Information Criteria Number of Free Parameters 24 Akaike (AIC) 6958.313 Bayesian (BIC) 7037.472 Sample-Size Adjusted BIC 6961.438 (n* = (n + 2) / 24) Entropy 0.838 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 77.82126 0.38911 2 122.17874 0.61089 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 77.82125 0.38911 2 122.17875 0.61089 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 76 0.38000 2 124 0.62000 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 1 0.956 0.044 2 0.042 0.958 MODEL RESULTS Estimates S.E. Est./S.E. Latent Class 1 READ WITH WRITE 9.024 3.276 2.755 MATH 24.570 5.285 4.649 SCIENCE 27.390 5.820 4.706 SOCST 25.783 5.457 4.724 WRITE WITH SOCST 18.927 3.559 5.319 MATH WITH SCIENCE 27.609 6.718 4.109 Means READ 45.417 0.942 48.209 WRITE 42.995 1.347 31.917 MATH 45.527 0.722 63.091 SCIENCE 45.100 1.172 38.487 SOCST 45.613 1.261 36.185 Variances READ 66.360 5.860 11.324 WRITE 28.467 4.359 6.530 MATH 55.061 6.780 8.121 SCIENCE 68.513 9.495 7.216 SOCST 85.301 8.522 10.010 Latent Class 2 READ WITH WRITE 9.024 3.276 2.755 MATH 24.570 5.285 4.649 SCIENCE 27.390 5.820 4.706 SOCST 25.783 5.457 4.724 WRITE WITH SOCST 18.927 3.559 5.319 MATH WITH SCIENCE 27.609 6.718 4.109 Means READ 56.570 1.153 49.054 WRITE 59.005 0.580 101.768 MATH 57.179 1.072 53.347 SCIENCE 56.150 1.018 55.171 SOCST 56.731 1.024 55.428 Variances READ 66.360 5.860 11.324 WRITE 28.467 4.359 6.530 MATH 55.061 6.780 8.121 SCIENCE 68.513 9.495 7.216 SOCST 85.301 8.522 10.010 Categorical Latent Variables GRP#1 ON FEMALE -1.166 0.419 -2.780 SES -1.069 0.278 -3.842 Intercepts GRP#1 2.297 0.665 3.456 LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables GRP#1 ON FEMALE 0.312 SES 0.343 ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION Parameterization using Reference Class 1 GRP#2 ON FEMALE 1.166 0.419 2.780 SES 1.069 0.278 3.842 Intercepts GRP#2 -2.297 0.665 -3.456