What is Mplus?
Mplus is a highly flexible, powerful statistical analysis software program that can fit an extensive variety of statistical models using one of many estimators available. Perhaps its greatest strengths are in its capabilities to model latent variables, both continuous and categorical, which underlie its flexibility. Among the many models Mplus can fit are:
- Regression models (linear, logistic, poisson, Cox proportional hazards, etc.)
- Factor analysis, exploratory and confirmatory
- Structural equation models
- Latent growth models
- Mixture models (latent class, latent profile, etc.)
- Longitudinal analysis (latent transition analysis, growth mixture models, etc.)
- Multilevel models
- Bayesian analysis
Additionally, Mplus can fit most of the models above to complex survey data as well as data that contain missing values or from multiply imputed data. Mplus also has extensive Monte Carlo simulation capabilities to generate data from statistical analyses and to perform power analyses.
In this seminar, we will learn some basic Mplus syntax which will empower you to use Mplus on your own.
To run an analysis in Mplus, 2 files are needed:
- A data file (often using a .dat extension)
- An input file containing a set of commands to analyze the data file (usually .inp extension)
Mplus creates an output file for each input file that is run. This opens by default after the analysis has been run, and it has the same name as the input file (but has an .out extension).
1.0 The data file
Important requirements for any Mplus data file:
- must be a text file
- no variable names at the top of the file; first row should be data
- all data must be numeric
- . and * may be used for missing values
By default, Mplus excepts data files in “free format”, where the values for each of the variables are separated by a delimiter, which must be a comma, space or tab. Missing values cannot be represented by blank spaces in free format. Most data files will be in this format.
The example below contains the first 20 lines from a file called hsb.dat. Here you can see the variables are separated by commas, and the variable names are not on the first line. The variables in the file are id, female, race, ses, schtyp, prog, read, write, math, science and socst.
70,0,4,1,1,1,57,52,41,47,57 121,1,4,2,1,3,68,59,53,63,61 86,0,4,3,1,1,44,33,54,58,31 141,0,4,3,1,3,63,44,47,53,56 172,0,4,2,1,2,47,52,57,53,61 113,0,4,2,1,2,44,52,51,63,61 50,0,3,2,1,1,50,59,42,53,61 11,0,1,2,1,2,34,46,45,39,36 84,0,4,2,1,1,63,57,54,58,51 48,0,3,2,1,2,57,55,52,50,51 75,0,4,2,1,3,60,46,51,53,61 60,0,4,2,1,2,57,65,51,63,61 95,0,4,3,1,2,73,60,71,61,71 104,0,4,3,1,2,54,63,57,55,46 38,0,3,1,1,2,45,57,50,31,56 115,0,4,1,1,1,42,49,43,50,56 76,0,4,3,1,2,47,52,51,50,56 195,0,4,2,2,1,57,57,60,58,56 114,0,4,3,1,2,68,65,62,55,61 85,0,4,2,1,1,55,39,57,53,46
A second option for formatting data files is fixed format, where variables occupy fixed positions in the data file (e.g. variable 1 is first 2 column, variable 2 is column 3, variable 3 is columns 4 through 6, etc.). Files formatted in this way were more commonly encountered in the past.
The file fixed.dat contains ten observations with the data in fixed-width columns. The codebook for the data is given below.
195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801
|variable name||column number|
2.0 The input file
The Mplus input file contains all of the commands to read the data file properly, run the statistical analysis, and to produce any graphs or additional output. Here are the commonly used commands (required sections are bolded):
- TITLE – title of analysis
- DATA – location and formatting of data file; this is the only command that will differ between free-formatted and fixed-formatted files
- VARIABLE – information about variables in data file, including their names
- DEFINE – used to generate new variable not found in the data file (e.g. creating dummy variables for a categorical variable)
- ANALYSIS – technical details of the analysis (estimator, algorithm)
- MODEL – statistical model to be fit
- OUTPUT – any additional output not produced by default by running the statistical model
Less commonly used commands are:
- SAVEDATA – save analysis data and some analysis results
- PLOT – generate graphics of data or analysis results
- MONTECARLO – for Monte Carlo simulation
Place a colon (:) after the name of the command in the input file so Mplus will recognize it as a command. After the command and colon, we specify code and options for that command. Each command option specification is separated by a semicolon (;). Command and option names can be shortened to their first four letters.
Mplus input file syntax
- Mplus is not case sensitive. However, in many examples of Mplus code, the Mplus commands and options are in capital letters to identify them as being part of the Mplus code.
- All statements must end with a semicolon. The TITLE command is the only command that does not have to end in a semicolon.
- The maximum length of any line in an Mplus input file is 90 characters (80 characters in older versions of Mplus).
If a statement needs more than 90 characters, break the statement up into multiple lines, ending the statement (not each line)
in a semicolon.
- Note: very long file path specifications can be problematic; you may need to save your files to a location that has a shorter file path.
- The symbol “=” and keywords “IS” and “ARE” can be used interchangeably in most commands (not in DEFINE, MODEL TEST or MODEL CONSTRAINT)
- Comments can be added to the Mplus syntax by starting the line with an exclamation point (!). The line does not need to be ended with a semi-colon. Each line of comment must start with an exclamation point.
The TITLE command is optional and specifies a title used for the output file. Titles can contain any combination of characters and numbers (except for the name of an input file section with a colon, for example “DATA:”), and do not need to terminate in a semicolon.
Here is a TITLE section for the freely formatted file hsb.dat above:
title: Entering data example free format using hsb.dat
The DATA command is required and contains the location of the data file and information about how it is formatted. By default, Mplus expects a free-formatted data file.
For most free-formatted files, the entirety of the DATA command will be the location of the data file. After “DATA:”, specify “file is” (or “file = “) and then the name of the file. Mplus will look for the data file in the same directory as where you save the input file, but you can place them in diferrent directories by specifying a full path for the data file.
Here is a DATA command for the freely formatted file hsb.dat above:
data: file is hsb.dat;
Fixed-formatted data file
Fixed format data are handled using a Fortran-type format statement in the data command block.
Here is a DATA command for the fixed formatted file fixed.dat above:
data: file is fixed.dat; format is (3F2.0, F1.0, 2F2.0, F1.0);
On the format statement, 3F2.0 indicates that the file begins with three variables each of length two. These are followed by one variable of length one (F1.0), then two of length 2 and one of length 1 (2F2.0, F1.0). This matches what we see in the codebook.
In the VARIABLE command, which is required, we specify the names of the variables and any information about them that Mplus needs to know to run the statistical analysis.
For every analysis, Mplus requires that the names of the variables be specified in the order that they appear in the data file. List the variable names after “names are” (or “names = “).
Here is the VARIABLE command for the free-formatted file hsb.dat:
variable: names are id female race ses schtyp prog read write math science socst;
Other options we can specify in the VARIABLE COMMAND:
- USEVARIABLES (often shortened to usevars) to select a subset of the variables to use in the analysis. By default, Mplus will use all of the variables in the data set.
- Note that for certain models if you specify variables under USEVARIABLES and don’t include them in the model, you will get a warning that the “Variable is uncorrelated with all other variables”.
- USEOBSERVATIONS to select a subest of observations to use
- MISSING to specify values that signify missing (e.g. MISSING ARE .;)
- CENSORED, NOMINAL, CATEGORICAL, and COUNT to specify dependent variables that fit one of those types
- STRATIFICATION, CLUSTER, and WEIGHT to variables reflecting complex or clustered sampling
- GROUPING to specify a grouping variable for multi-group analyses
Further advice for using the VARIABLES command
- Mplus cannot handle string variables; such variables should be removed from the data file or converted to numeric before converting the data set to Mplus.
- Variable names can be no longer than 8 characters; if your variable names are longer than 8 characters, they will be truncated to 8 characters. Variable names must start with an alphabet character (i.e., a letter of the alphabet). Variable names can contain numbers and/or the underscore character (_).
- Dummy variables must be created for any categorical predictor variables. You can either do this in your preferred general-use statistical software package (e.g., SAS, Stata, SPSS, R, etc.) or in Mplus in a DEFINE command block.
The ANALYSIS command specifies the technical details of the statistical analysis, such as the type of analysis, the estimator and the algorithm used. The ANALYSIS command is optional, and if the default settings for the options are appropriate for the analysis (see the Mplus User’s Guide for defaults), then can be skipped. Explanation of most of the ANALYSIS options is beyond the scope of this introductory seminar, but we will use some of the options in our model examples later.
The TYPE option for the ANALYSIS command is set to “general” by default, which is appropriate for a large variety of models which estimate relationships between observed variables and continuous latent variables (e.g. regression models, path analysis, CFA, SEM and latent growth models with continuous latent variables). Other settings for TYPE include TYPE=MIXTURE for categorical latent variable models, and TYPE=TWOLEVEL or TYPE=THREELEVEL for multilevel models.
For our first Mplus syntax file, we will be using TYPE=BASIC, which estimates descriptives such as means, variances, and correlations. No statistical model is fit. Here is such an ANALYSIS command:
analysis: type = basic;
Full input file for basic analysis of free-formatted file hsb.dat
The full set of Mplus commands to read hsb.dat and estimate descriptives are shown below. These are the commands that you can enter into a blank Mplus text file and save as an input file (.inp). The DATA and VARIABLES command blocks are required. The ANALYSIS command block is included so that we can check the data.
title: Entering data example free format using hsb.dat data: file is hsb.dat; variable: names are id female race ses schtyp prog read write math science socst; analysis: type = basic;
After saving and running the .inp file, you can look in the output file for “INPUT READING TERMINATED NORMALLY” appearing below the entered code. This is a good first check that your data were read in successfully. We will discuss further checks in the next section.44
Full input file for basic analysis of fixed-formatted file fixed.dat
The Mplus commands are shown below
title: Entering data example fixed format using fixed.dat data: file is fixed.dat; format is (3F2.0, F1.0, 2F2.0, F1.0); variable: names are id a1 t1 gender a2 t2 tgender; missing are blank; analysis: type = basic;
Again, after saving and running this input, you can check the output to see if “INPUT READING TERMINATED NORMALLY” appears.
We did not use the DEFINE, MODEL, or OUTPUT commands for our first Mplus file, but below is some basic information about each of them:
The DEFINE command is used to generate new variables that are not found in the data set. Mplus provides several mathematical and logical operators, as well as options to transform variables in many ways. Variables generated in the DEFINE command must be listed in the USEVARIABLES option of the VARIABLES command and must be listed after the variable transformed to create the new variable.
Here is an example of using the DEFINE command to create a new variable “highmath” that is a dichotomized version of variable math, and the accompanying VARIABLE command with the USEVARIABLES option:
variable: names are id female race ses schtyp prog read write math science socst highmath; define: highmath = (math > 50);
The MODEL command specifies the statistical model to be estimated. We will be exploring several different MODEL commands to specify different classes of models throughout the seminar. Three important keywords (options) are used in the MODEL command to specify relationships among variables:
- BY is used to indicate indicators for latent variables
- ON is used for regressions
- WITH is used for correlations
For example, if we wanted to define a latent variable representing academic prowess that is measured by 5 test score variables, we could specify (we would also need to add an ANALYSIS command with TYPE=GENERAL):
analysis: type = general; model: academic by read write math science socst;
The MODEL command is technically optional, but almost always specified unless we only want descriptive statistics (ANALYSIS: TYPE=basic;).
The OUTPUT command is used to request additional output not normally produced by the analysis specified in ANALYSIS and MODEL. Some options for additional output:
- SAMPSTAT – sample statistics, including means, variances, skewness, kurtosis, minima and maxima, median and percentiles, and covariances and correlations,
- STD, STDXY, STDY – for standardized coefficients
- RESIDUAL – residual estimates
- MODINDICES – modification indices
- CINTERVAL – confidence intervals for model parameters
- TECH1 through TECH16 – the 16 TECH options output some of the details of the estimation procedure, such as starting values, covariance matrices of model parameters, and optimization (model fitting) history
For example, to request all of the sample statistics available, we can specify this OUTPUT command:
3.0 Importing data from other statistical software
Preparing data and input files using Stata
If you are a Stata user, a user-written a command, stata2mplus, will convert a Stata dataset to an Mplus ASCII data file plus the necessary commands (in an Mplus input file) to read in the data. You can get the stata2mplus ado file by typing search stata2mplus in the Stata command window and following the directions that are given.
Here is the Stata command to load and convert the Stata dataset hsb2.dta to Mplus. A .dat file containing the dataset and the input file needed to read the dataset into Mplus are created. It stores both in the current working directory in Stata (use the command pwd to get the path) with the dataset name hsb2.dat and hsb2.inp.
use https://stats.idre.ucla.edu/stat/stata/notes/hsb2.dta, clear stata2mplus using hsb2 Looks like this was a success. To convert the file to mplus, start mplus and run the file hsb2.inp
The code from the input file created appears below. The Mplus .inp file is saved in the current working directory, which is listed in the lower left-hand corner of the Stata window. To change it, you can use the Stata’s cd command.
The .inp file contains more detail about the data file than our earlier examples; however, all of the same command blocks are present. Again, the analysis type = basic statement is included to allow you to run descriptive statistics in order to insure that the data were input correctly.
Title: Stata2Mplus conversion for hsb2.dta List of variables converted shown below id : female : 0: male 1: female race : 1: hispanic 2: asian 3: african-amer 4: white ses : 1: low 2: middle 3: high schtyp : type of school 1: public 2: private prog : type of program 1: general 2: academic 3: vocation read : reading score write : writing score math : math score science : science score socst : social studies score Data: File is hsb2.dat; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999); Usevariables are id female race ses schtyp prog read write math science socst; Analysis: Type = basic;
The program stata2mplus can also convert missing values in Stata to missing values codes in the Mplus data file (e.g. -9999). Use the missing option of stata2mplus to specify a missing value code. This code will appear in the MISSING option of the VARIABLES command of the input file created by stata2pmlus.
use https://stats.idre.ucla.edu/stat/data/hsbmis.dta, clear stata2mplus using hsbmis, missing(-9999)
Looks like this was a success. To convert the file to mplus, start mplus and run the file hsbmis.inp
The input file for this example is identical to the previous example except for the file name.
Preparing data for Mplus from SPSS
If you are an SPSS user, you can prepare your data to be read into Mplus with a few steps detailed in SPSS FAQ: How can I move my data from SPSS to Mplus?. Starting from the hsb2.sav dataset, once you have created a .csv file, hsb2.csv, without variable names, the code below can read in your data.
title: Entering data from SPSS data: file is hsb2.csv; variable: names are id female race ses schtyp prog read write math science socst; analysis: type = basic;
If your SPSS data file contains missing data, complete the same steps you would for SPSS data without missing values, but note the values used for missing values. For example, if -999 is the value used in coding missing values, then the previous example’s code would be amended with a Missing statement in the Variable: block indicating this. Below, we use hsbmis.csv.
title: Missing data from SPSS data: file is hsbmis.csv; variable: names are id female race ses schtyp prog read write math science socst; missing are all (-999); analysis: type = basic;
4.0 Mplus User’s Guide
The Mplus User’s Guide is the reference manual for Mplus. It contains detailed information about all of the input file commands, as well as numerous examples of a huge variety of models, with code and explanation for each example.
The Mplus User’s Guide can be found on the Mplus website.