Mplus Class NotesAnalyzing Data

Mplus version 5.2 was used for these examples.

Mplus has a rich collection of regression models including ordinary least squares (OLS) regression, probit regression, logistic regression, ordered probit and logit regressions, multinomial probit and logit regressions, poisson regression, negative binomial regression, inflated poisson and negative binomial regressions, censored regression and censored inflated regression.

The keyword for regression models is on, as in response variable regressed on predictor1, predictor2, etc. In context, a regression command looks like this:

response_var on var1 var2;

For most of the examples we will be using the hsbdemo.dat dataset. It contains a nice collection of continuous, binary, ordered, categorical and count variables. You can download the data by clicking here. In this example we will boldface the line that specifies the regression analysis.

Ordinary least squares (OLS) regression

In our first example we will use a standardized test, write, as the response variable and the continuous variables read and math as predictors along with the binary predictor female. We begin by showing the input file which we called hsbreg.inp.

Title: 
  OLS regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     write female read math;
Model: 
  write on female read math;

Next, we will take a look at the output file, hsbreg.out. Note that Mplus repeats all of the input code at the beginning of the output file.

Mplus VERSION 5.2
MUTHEN & MUTHEN
08/19/2009  11:12 AM

INPUT INSTRUCTIONS

  Title:
    OLS regression
  Data:
    File is hsbdemo.dat;
  Variable:
    Names are
       id female ses schtyp prog read write math science socst honors awards
       cid;
    Usevariables are
       write female read math;
  Model:
    write on female read math;


INPUT READING TERMINATED NORMALLY

OLS regression

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    1
Number of independent variables                                  3
Number of continuous latent variables                            0

Observed dependent variables

  Continuous
   WRITE

Observed independent variables
   FEMALE      READ        MATH


Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20

Input data file(s)
  hsbdemo.dat

Input data format  FREE

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              0.000
          Degrees of Freedom                     0
          P-Value                           0.0000

Chi-Square Test of Model Fit for the Baseline Model

          Value                            149.335
          Degrees of Freedom                     3
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.000

Loglikelihood

          H0 Value                       -2224.303
          H1 Value                       -2224.303

Information Criteria

          Number of Free Parameters              5
          Akaike (AIC)                    4458.607
          Bayesian (BIC)                  4475.098
          Sample-Size Adjusted BIC        4459.258
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000
          90 Percent C.I.                    0.000  0.000
          Probability RMSEA <= .05           0.000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.000

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 WRITE    ON
    FEMALE             5.443      0.926      5.881      0.000
    READ               0.325      0.060      5.409      0.000
    MATH               0.397      0.066      6.047      0.000

 Intercepts
    WRITE             11.896      2.834      4.197      0.000

 Residual Variances
    WRITE             42.367      4.237     10.000      0.000

Regression with missing data

For our next example we will use a dataset, hsbmis2.dat, that has observations with missing data. You can download the dataset by clicking here. Starting with Mplus 5, the default analysis type allows for analysis of missing data by full information maximum likelihood (FIML). The FIML approach uses all of the available information in the data and yields unbiased parameter estimates as long as the missingness is at least missing at random.

It is worth noting that this missing data approach is available for all of the different regression models, not just for the OLS regression.

Since the Mplus output includes the input instructions we will just include the output file. The only difference between this analysis and the previous one is the missing statement in the variable command block that declares the values of the missing data to be -9999. It is shown in boldface below.

Mplus VERSION 5.2
MUTHEN & MUTHEN
09/14/2009   1:45 PM

INPUT INSTRUCTIONS

  Title:
    multiple regression with missing data
  Data:
    File is hsbmis2.dat ;
  Variable:
    Names are
    id female race ses hises prog academic read write math science socst hon;
    usevariables are write read female math;
    Missing are all (-9999) ;
  Model:
    write on female read math;


INPUT READING TERMINATED NORMALLY


multiple regression with missing data

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    1
Number of independent variables                                  3
Number of continuous latent variables                            0

Observed dependent variables

  Continuous
   WRITE

Observed independent variables
   READ        FEMALE      MATH


Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03

Input data file(s)
  hsbmis2.dat

Input data format  FREE


SUMMARY OF DATA

     Number of missing data patterns             8


COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value   0.100


     PROPORTION OF DATA PRESENT

           Covariance Coverage
              WRITE         READ          FEMALE        MATH
              ________      ________      ________      ________
 WRITE          1.000
 READ           0.815         0.815
 FEMALE         0.675         0.550         0.675
 MATH           0.715         0.575         0.475         0.715


THE MODEL ESTIMATION TERMINATED NORMALLY


TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              0.000
          Degrees of Freedom                     0
          P-Value                           0.0000

Chi-Square Test of Model Fit for the Baseline Model

          Value                            125.057
          Degrees of Freedom                     3
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.000

Loglikelihood

          H0 Value                       -1871.900
          H1 Value                       -1871.900

Information Criteria

          Number of Free Parameters              5
          Akaike (AIC)                    3753.800
          Bayesian (BIC)                  3770.292
          Sample-Size Adjusted BIC        3754.451
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000
          90 Percent C.I.                    0.000  0.000
          Probability RMSEA <= .05           0.000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.000


MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 WRITE    ON
    FEMALE             5.435      1.121      4.847      0.000
    READ               0.298      0.072      4.168      0.000
    MATH               0.401      0.077      5.236      0.000

 Intercepts
    WRITE             12.950      2.951      4.388      0.000

 Residual Variances
    WRITE             41.622      4.716      8.825      0.000

Up near the beginning of the output there is a table that shows the proportion of data present for each of the covariates in the model. The model results near the bottom show estimates and standard errors that are close to the first model with complete data.

Probit and logit models

In this section we will cover a number of binary response models including ordinal binary models and a multinomial binary model. From this point forward we will boldface those Mplus commands which are differ from ones found in the OLS models.

Probit regression

We will begin with a probit regression model. Mplus treats this as a probit model because we declare that honors is a categorical variable, and honors is a binary variable.

Note that Mplus uses a weighted least squares with missing values estimator (as indicated in the output below).

Mplus VERSION 5.2
MUTHEN & MUTHEN
08/19/2009  11:14 AM

INPUT INSTRUCTIONS

  Title:
    probit regression
  Data:
    File is hsbdemo.dat;
  Variable:
    Names are
       id female ses schtyp prog read write math science socst honors awards
       cid;
    Usevariables are
       honors female read math;
    Categorical = honors;
  Model:
    honors on female read math;


INPUT READING TERMINATED NORMALLY


probit regression

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    1
Number of independent variables                                  3
Number of continuous latent variables                            0

Observed dependent variables

  Binary and ordered categorical (ordinal)
   HONORS

Observed independent variables
   FEMALE      READ        MATH

Estimator                                                    WLSMV
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Parameterization                                             DELTA

Input data file(s)
  hsbdemo.dat

Input data format  FREE

SUMMARY OF CATEGORICAL DATA PROPORTIONS

    HONORS
      Category 1    0.735
      Category 2    0.265

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              0.000*
          Degrees of Freedom                     0**
          P-Value                           0.0000

*   The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used
    for chi-square difference tests.  MLM, MLR and WLSM chi-square difference
    testing is described in the Mplus Technical Appendices at www.statmodel.com.
    See chi-square difference testing in the index of the Mplus User's Guide.

**  The degrees of freedom for MLMV, ULSMV and WLSMV are estimated according to
    a formula given in the Mplus Technical Appendices at www.statmodel.com.
    See degrees of freedom in the index of the Mplus User's Guide.

Chi-Square Test of Model Fit for the Baseline Model

          Value                             35.149
          Degrees of Freedom                     3
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.000

Number of Free Parameters                        4

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000

WRMR (Weighted Root Mean Square Residual)

          Value                              0.000

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 HONORS   ON
    FEMALE             0.682      0.256      2.661      0.008
    READ               0.047      0.017      2.745      0.006
    MATH               0.074      0.016      4.532      0.000

 Thresholds
    HONORS$1           7.663      1.149      6.671      0.000


R-SQUARE

    Observed                   Residual
    Variable        Estimate   Variance

    HONORS             0.553      1.000

For information on interpreting the results of probit models, please visit Annotated Output: Probit Regression .

Logistic regression

Next we have a logistic regression model. The difference between this model and the probit model is that we specify that maximum likelihood is to be used as the estimator. For the rest of this section we will present only the input files for each of the models.

Title: 
  logistic regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     honors female read math;
  Categorical = honors;
Analysis:
  Estimator = ML;
! link = logit;
Model: 
  honors on female read math;

For information on interpreting the results of logistic models, please visit Annotated Output: Logit Regression .

Ordered probit regression

For this next model we use an ordered response variable, ses, which takes on the values 1, 2 and 3. Other then the ordered variable itself the setup is identical to the binary probit model.

Title: 
  ordered probit regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     ses female read math;
  Categorical = ses;
Model: 
  ses on female read math;

Ordered logistic regression

For the ordered logit model we again use the maximum likelihood estimator.

Title: 
  ordered logistic regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     ses female read math;
  Categorical = ses;
Analysis:
  Estimator = ML;
Model: 
  ses on female read math;

Multinomial logistic regression

For the multinomial logit model we use the variable prog, which indicates the type of high school program, where 1 is general, 2 is academic and 3 is vocational. We again use the maximum likelihood estimator but declare prog to be a nominal variable.

Title: 
  multinomial logistic regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     prog female read math;
  Nominal = prog;
Analysis:
  Estimator = ML;
Model: 
  prog on female read math;

For information on interpreting the results of multinomial logistic models, please visit Annotated Output: Multinomial Logistic Regression .

Count regression models

In this final section we will cover four count models: Poisson, negative binomial, zero-inflated poisson and zero-inflated negative binomial.

Poisson regression

The first model in this section is a poisson regression model using awards as the count response variable. Notice the (p) for poisson on the boldface line.

Title: 
  poisson regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     awards female read math;
  Count = awards (p);
Model: 
  awards on female read math;

For information on interpreting the results of poisson models, please visit Annotated Output: Poisson Regression .

Negative binomial regression

The next model in this section is a negative binomial regression model. Negative binomial models are useful when there is overdispersion in the data. Notice the (nb) for negative binomial on the boldface line.

Title: 
  negative binomial regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     awards female read math;
  Count = awards (nb);
Model: 
  awards on female read math;

Zero-inflated poisson regression

The next model is a zero-inflated poisson regression model. Zero-inflated models are useful when there is a much greater number of zeros than would be expected from the count model alone. Notice the (pi) for zero-inflated poisson on the boldface line.

The zero-inflated models are examples of multiple equation models. In this case, there is one equation for the count model, awards on female read math, and a second equation for estimating the excess zeros, awards#1 on female read math. Although we are using the same predictors in both equations, this is not necessary. You will also note that the output contains a set of parameter estimates for each equation. Thus, the estimate for female of 0.214 is for the count equation and the estimate -4.029 is for the excess zero equation.

Mplus VERSION 5.2
MUTHEN & MUTHEN
08/19/2009  11:29 AM

INPUT INSTRUCTIONS

  Title:
    zero inflated poisson regression
  Data:
    File is hsbdemo.dat;
  Variable:
    Names are
       id female ses schtyp prog read write math science socst honors awards
       cid;
    Usevariables are
       awards female read math;
    Count = awards (pi);
  Model:
    awards on female read math;
    awards#1 on female read math;


INPUT READING TERMINATED NORMALLY

zero inflated poisson regression

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    1
Number of independent variables                                  3
Number of continuous latent variables                            0

Observed dependent variables

  Count
   AWARDS

Observed independent variables
   FEMALE      READ        MATH


Estimator                                                      MLR
Information matrix                                        OBSERVED
Optimization Specifications for the Quasi-Newton Algorithm for
Continuous Outcomes
  Maximum number of iterations                                 100
  Convergence criterion                                  0.100D-05
Optimization Specifications for the EM Algorithm
  Maximum number of iterations                                 500
  Convergence criteria
    Loglikelihood change                                 0.100D-02
    Relative loglikelihood change                        0.100D-05
    Derivative                                           0.100D-02
Optimization Specifications for the M step of the EM Algorithm for
Categorical Latent variables
  Number of M step iterations                                    1
  M step convergence criterion                           0.100D-02
  Basis for M step termination                           ITERATION
Optimization Specifications for the M step of the EM Algorithm for
Censored, Binary or Ordered Categorical (Ordinal), Unordered
Categorical (Nominal) and Count Outcomes
  Number of M step iterations                                    1
  M step convergence criterion                           0.100D-02
  Basis for M step termination                           ITERATION
  Maximum value for logit thresholds                            15
  Minimum value for logit thresholds                           -15
  Minimum expected cell size for chi-square              0.100D-01
Optimization algorithm                                         EMA
Integration Specifications
  Type                                                    STANDARD
  Number of integration points                                  15
  Dimensions of numerical integration                            0
  Adaptive quadrature                                           ON
Cholesky                                                       OFF

Input data file(s)
  hsbdemo.dat
Input data format  FREE

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                        -277.742
          H0 Scaling Correction Factor       0.975
            for MLR

Information Criteria

          Number of Free Parameters              8
          Akaike (AIC)                     571.483
          Bayesian (BIC)                   597.870
          Sample-Size Adjusted BIC         572.525
            (n* = (n + 2) / 24)

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 AWARDS     ON
    FEMALE             0.214      0.133      1.611      0.107
    READ               0.023      0.008      2.901      0.004
    MATH               0.033      0.009      3.446      0.001

 AWARDS#1   ON
    FEMALE            -4.029      1.475     -2.731      0.006
    READ              -0.178      0.074     -2.400      0.016
    MATH              -0.203      0.063     -3.207      0.001

 Intercepts
    AWARDS#1          19.096      4.909      3.890      0.000
    AWARDS            -2.485      0.483     -5.141      0.000

For information on interpreting the results of zero-inflated poisson models models, please visit Annotated Output: Zero-inflated Poisson Regression .

Zero-inflated negative binomial regression

The final model in this section is a zero-inflated negative binomial regression model. The setup for this model parallels that of the zero-inflated poisson model above. Notice the (nbi) for zero-inflated negative binomial on the boldface line.

Title: 
  zero inflated negative binomial regression
Data:
  File is hsbdemo.dat;
Variable:
  Names are 
     id female ses schtyp prog read write math science socst honors awards
     cid;
  Usevariables are
     awards female read math;
  Count = awards (nbi);
Model: 
  awards on female read math;
  awards#1 on female read math;