A Latent Class Example | Mplus Code Fragments

These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.

Consider the file Stata file hsb6 that has 600 observations with information about students like their reading, writing, math and other achievement scores. For the variables locus concept mot read-ss we will make a binary variable called hi___ that is 1 if the score is over the median, and 0 if below the median. This will be useful when we need a binary variable. Download the Mplus-ready dataset here as hsb6.csv.

Example 1

A latent class analysis with 2 classes, and continuous indicators

Here is the input file

Data:
  File is hsb6.csv;
Variable:
  Names are 
          id gender race ses sch prog locus concept mot career read write math 
          sci ss hilocus hiconcept himot hiread hiwrite himath hisci hiss;
  Usevariables are 
     read write math sci ss ;
  classes = c(2);
Analysis: 
  Type=mixture;
SAVEDATA:
  file is lca_ex1.txt ;
  save is cprob;
  format is free;

Here is the output.

Section 1

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

    Latent
   Classes

       1        274.09081          0.45682
       2        325.90919          0.54318

One way to view the third column is the average probability of falling into Class 1 and Class 2. As a result column 2 is the average probability times 600.
A second way to view the third column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in Class 1, and .2 of being in Class 2, then that person contributes .8 to Class 1 and .2 to Class 2. This is why these are these are fractional.
A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as Class 1 or Class 2, and that threshold can be used to compute the probabilities of being in the classes, see Section 5.

Section 2

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

    Latent
   Classes

       1              272          0.45333
       2              328          0.54667

This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into. Note the counts are exact whole numbers.

Section 3

------------------------------------------------------------------------------
Classification Probabilities for the Most Likely Latent Class Membership (Column)
by Latent Class (Row)

           1        2

    1   0.950    0.050
    2   0.036    0.964

This is related to the output in #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

Section 4

------------------------------------------------------------------------------
MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

Latent Class 1

 Means
    READ              43.783      0.642     68.151      0.000
    WRITE             45.068      0.730     61.737      0.000
    MATH              44.794      0.469     95.540      0.000
    SCI               44.446      0.740     60.050      0.000
    SS                45.574      0.658     69.237      0.000

 Variances
    READ              46.463      2.785     16.681      0.000
    WRITE             49.428      3.011     16.415      0.000
    MATH              46.634      3.133     14.884      0.000
    SCI               49.022      3.388     14.470      0.000
    SS                62.216      4.109     15.141      0.000

Latent Class 2

 Means
    READ              58.730      0.605     96.999      0.000
    WRITE             58.538      0.497    117.763      0.000
    MATH              57.782      0.687     84.119      0.000
    SCI               57.917      0.499    116.079      0.000
    SS                57.488      0.589     97.628      0.000

 Variances
    READ              46.463      2.785     16.681      0.000
    WRITE             49.428      3.011     16.415      0.000
    MATH              46.634      3.133     14.884      0.000
    SCI               49.022      3.388     14.470      0.000
    SS                62.216      4.109     15.141      0.000

This shows the average on the scores for the two classes. Class 1 is a low performing group, and Class 2 is a high performing group.

Section 5

------------------------------------------------------------------------------
Categorical Latent Variables
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
 Means
    C#1               -0.173      0.133     -1.298      0.194

This is the threshold for dividing the two classes. If you are below the threshold, you are Class 1, above it and you are Class 2. We see the threshold is -0.173. Say that we then convert this threshold to a probability like this, letting $t_1$ be Threshold 1

$$P(\mbox{Class}=1) = \dfrac{1}{1 + exp(-t_1)} = \dfrac{1}{ 1 + exp( 0.173)} = .4568$$

(compare to Section 1 above).

$$P(\mbox{Class}=2) = 1 – \dfrac{1}{1 + exp(-t_1)} = 1 – \dfrac{1}{1 + exp( 0.173)} =.54314$$

(compare to Section 1 above).

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 class using lca_ex1.txt

Below we show the first observations from the middle of this file. Note that cprob1 is the probability of being in Class 1 and cprob2 is the probability of being in Class 2, and class is the class membership based on the class with the highest probability.

list in 200/210

     +-------------------------------------------------------------+
     | read   write   math    sci     ss   cprob1   cprob2   class |
     |-------------------------------------------------------------|
200. | 46.9    52.1   42.5   47.7   60.5     .944     .056       1 |
201. | 46.9    51.5     57   49.8   40.6       .9       .1       1 |
202. | 46.9    52.8   49.3   53.1   35.6     .983     .017       1 |
203. | 46.9    43.7   41.9   41.7   35.6        1        0       1 |
204. | 46.9    61.9     53   52.6   60.5     .016     .984       2 |
     |-------------------------------------------------------------|
205. | 46.9    41.1   45.3   47.1   55.6     .998     .002       1 |
206. | 46.9    38.5   47.1   41.7   25.7        1        0       1 |
207. | 46.9    54.1   46.4   49.8   55.6     .827     .173       1 |
208. | 46.9    51.5   48.5   49.8   50.6     .934     .066       1 |
209. | 46.9    41.1   53.6   41.7   55.6     .995     .005       1 |
     |-------------------------------------------------------------|
210. | 46.9    61.9   46.2   60.7   45.6      .17      .83       2 |
     +-------------------------------------------------------------+

Note that if we tabulate class we see where the values from Section 2 of the output came from.

tab class

      class |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        272       45.33       45.33
          2 |        328       54.67      100.00
------------+-----------------------------------
      Total |        600      100.00

Note that if we take the average of cprob1 and cprob2, we can relate these values to Column 2 of Section 1 of the output.

summ cprob1 cprob2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      cprob1 |       600    .4568233    .4664192          0          1
      cprob2 |       600    .5431767    .4664192          0          1

If we sum the probabilities, we can relate these to Column 1 of Section 1 of the output.

tabstat cprob1 cprob2, stat(sum)

   stats |    cprob1    cprob2
---------+--------------------
     sum |   274.094   325.906
------------------------------

If we average the probabilities by class, we can relate these values to Section 3 of the output.

tabstat cprob1 cprob2, by(class)

Summary statistics: mean
  by categories of: class 

   class |    cprob1    cprob2
---------+--------------------
       1 |  .9570699  .0429301
       2 |  .0419848  .9580152
---------+--------------------
   Total |  .4568233  .5431767
------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in Class 1 and then again weighting by the probability of being in Class 2. Note the correspondence between these means and the means from Section 4 of the output.

tabstat read write math sci ss [aw=cprob1], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  43.78268  45.06829  44.79421  44.44601   45.5743
------------------------------------------------------------

tabstat read write math sci ss [aw=cprob2], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  58.73021  58.53821  57.78224  57.91736  57.48822
------------------------------------------------------------

Example 2

A latent class analysis with 3 classes, and continuous indicators

Here is the input file

Data:
  File is hsb6.csv;
Variable:
  Names are 
          id gender race ses sch prog locus concept mot career read write math 
          sci ss hilocus hiconcept himot hiread hiwrite himath hisci hiss;
  Usevariables are 
     read write math sci ss ;
  classes = c(3);
Analysis: 
  Type=mixture;
SAVEDATA:
  file is lca_ex2.txt ;
  save is cprob;
  format is free;

Here is the output

Section 1

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

    Latent
   Classes

       1        194.55393          0.32426
       2        153.04798          0.25508
       3        252.39809          0.42066

One way to view the second column is the average probability of falling into Class 1 and Class 2. As a result Column 1 is the average probability times 600 (see Stata example below for comparison).
A second way to view the second column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in Class 1, and .2 of being in Class 2, then that person contributes .8 to Class 1 and .2 to Class 2. This is why these are these are fractional (see Stata example below for comparison).
A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as Class 1 or Class 2. If you are below the threshold, you are Class 1, above it and you are Class 2. Below we see the threshold is -0.173. Say that we then convert this threshold to a probability

$$\dfrac{exp( -0.173)}{1 + exp(-0.173)}= .4568$$ (compare to above)

Section 2

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

    Latent
   Classes

       1              197          0.32833
       2              154          0.25667
       3              249          0.41500

Section 3

------------------------------------------------------------------------------
Classification Probabilities for the Most Likely Latent Class Membership (Column)
by Latent Class (Row)

           1        2        3

    1   0.952    0.000    0.048
    2   0.000    0.919    0.081
    3   0.047    0.053    0.900

This is related to the output in Section 1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

Section 4

------------------------------------------------------------------------------
MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

Latent Class 1

 Means
    READ              41.735      0.477     87.540      0.000
    WRITE             42.703      0.962     44.390      0.000
    MATH              43.178      0.516     83.648      0.000
    SCI               42.160      0.663     63.625      0.000
    SS                43.848      0.695     63.097      0.000

 Variances
    READ              32.997      2.820     11.699      0.000
    WRITE             42.369      3.775     11.223      0.000
    MATH              34.562      2.422     14.269      0.000
    SCI               38.395      2.714     14.146      0.000
    SS                53.884      3.850     13.996      0.000

Latent Class 2

 Means
    READ              63.644      0.948     67.117      0.000
    WRITE             61.193      0.453    135.170      0.000
    MATH              62.610      0.865     72.404      0.000
    SCI               61.648      0.667     92.451      0.000
    SS                61.232      0.758     80.759      0.000

 Variances
    READ              32.997      2.820     11.699      0.000
    WRITE             42.369      3.775     11.223      0.000
    MATH              34.562      2.422     14.269      0.000
    SCI               38.395      2.714     14.146      0.000
    SS                53.884      3.850     13.996      0.000

Latent Class 3

 Means
    READ              52.618      0.925     56.867      0.000
    WRITE             54.507      0.727     74.938      0.000
    MATH              52.008      0.835     62.319      0.000
    SCI               53.172      0.835     63.681      0.000
    SS                52.794      0.808     65.325      0.000

 Variances
    READ              32.997      2.820     11.699      0.000
    WRITE             42.369      3.775     11.223      0.000
    MATH              34.562      2.422     14.269      0.000
    SCI               38.395      2.714     14.146      0.000
    SS                53.884      3.850     13.996      0.000

This shows the average on the scores for the two classes. Class 1 is a low performing group, and Class 2 is a medium performing group, and Class 3 is a high performing group.

Section 5

------------------------------------------------------------------------------
Categorical Latent Variables

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
 Means
    C#1               -0.260      0.130     -2.010      0.044
    C#2               -0.500      0.181     -2.766      0.006

This is the threshold for dividing the three classes. Note that this is now like a multinomial logistic regression, where the thresholds divide three multinomial categories, with Class 3 being the reference category and C#1 is the threshold for being in Class 1 as compared to Class 3, and C#2 is the threshold for being in Class 2 as compared to Class 3. For the comparison group, Class 3, the probability of being in that class is computed as below, letting $t_1$ be threshold 1 (.24) and $t_2$ be threshold 2 (.5).

$$P(\mbox{Class}=3) = \dfrac{1}{1 + exp(-t_1) + exp(-t_2)} = \dfrac{1}{1 + exp(.26) + exp(.5)} = 0.253$$

For Classes 1 and 2, the formula is a bit different since these are not the comparison class. For Class 1, the formula is

$$P(\mbox{Class}=1) = \dfrac{exp(-t_1)}{1 + exp(-t_1) + exp(-t_2)} = \dfrac{exp(.26)}{1 + exp(.26) + exp(.5)} = 0.329$$

For Class 2, the formula is

$$P(\mbox{Class}=2) =\dfrac{exp(-t_2)}{1 + exp(-t_1) + exp(-t_2)} = \dfrac{exp(.5)}{1 + exp(.26) + exp(.5)} = 0.418$$

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 cprob3 class using lca_ex2.txt

Below we show observations from the middle of this file. Note that cprob1 is the probability of being in Class 1 and cprob2 is the probability of being in Class 2, cprob3 is the probability of being in Class 3, and class is the class membership based on the class with the highest probability. Note that we don’t see any folks in Class 3 here, but there are members of Class 3.

list in 200/210

     +----------------------------------------------------------------------+
     | read   write   math    sci     ss   cprob1   cprob2   cprob3   class |
     |----------------------------------------------------------------------|
200. | 46.9    52.1   42.5   47.7   60.5     .133     .867        0       2 |
201. | 46.9    51.5     57   49.8   40.6     .062     .938        0       2 |
202. | 46.9    52.8   49.3   53.1   35.6     .228     .772        0       2 |
203. | 46.9    43.7   41.9   41.7   35.6     .998     .002        0       1 |
204. | 46.9    61.9     53   52.6   60.5        0     .996     .004       2 |
     |----------------------------------------------------------------------|
205. | 46.9    41.1   45.3   47.1   55.6     .812     .188        0       1 |
206. | 46.9    38.5   47.1   41.7   25.7        1        0        0       1 |
207. | 46.9    54.1   46.4   49.8   55.6     .039     .961        0       2 |
208. | 46.9    51.5   48.5   49.8   50.6       .1       .9        0       2 |
209. | 46.9    41.1   53.6   41.7   55.6     .709     .291        0       1 |
     |----------------------------------------------------------------------|
210. | 46.9    61.9   46.2   60.7   45.6     .001     .999        0       2 |
     +----------------------------------------------------------------------+

Note that if we tabulate class we see where the values from Section 2 of the output came from.

tab class

      class |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        197       32.83       32.83
          2 |        249       41.50       74.33
          3 |        154       25.67      100.00
------------+-----------------------------------
      Total |        600      100.00

Note that if we take the average of cprob1, cprob2, and cprob3 we can relate these values to column 2 of section #1 of the output.

summ cprob1 cprob2 cprob3

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      cprob1 |       600    .3242633     .440132          0          1
      cprob2 |       600    .4206317    .4326395          0       .999
      cprob3 |       600    .2550817    .3989861          0          1

If we sum the probabilities, we can relate these to Column 1 of Section 1 of the output.

tabstat cprob1 cprob2 cprob3, stat(sum)

   stats |    cprob1    cprob2    cprob3
---------+------------------------------
     sum |   194.558   252.379   153.049
----------------------------------------

If we average the probabilities by class, we can relate these values to section #3 of the output.

tabstat cprob1 cprob2 cprob3, by(class)

Summary statistics: mean
  by categories of: class 

   class |    cprob1    cprob2    cprob3
---------+------------------------------
       1 |  .9401117  .0598883         0
       2 |  .0375743  .9123735   .049996
       3 |         0   .087013   .912987
---------+------------------------------
   Total |  .3242633  .4206317  .2550817
----------------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in Class 1 and then again weighting by the probability of being in Class 2, and likewise for Class 3. Note the correspondence between these means and the means from Section 4 of the output.

tabstat read write math sci ss [aw=cprob1], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  41.73485  42.70297  43.17746  42.16013  43.84801
------------------------------------------------------------

tabstat read write math sci ss [aw=cprob2], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  52.61804  54.50678  52.00815  53.17197  52.79395
------------------------------------------------------------

tabstat read write math sci ss [aw=cprob3], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  63.64527  61.19303  62.61002   61.6482   61.2325
------------------------------------------------------------