A Latent Class Example, Examples 3 and 4

These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.

This picks up after Examples 1 and 2, but considers adding categorical variables to the model. Based on

We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .

Example 3: A latent class analysis with 2 classes, and continuous indicators, and a two level categorical variable.

Here is the input file

Data:
  File is I:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ;
Variable:
  Names are 
   id gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     hiread write math sci ss ;
  categorical = hiread;
  classes = c(2);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [hiread$1 *-1 math sci ss write  *30 ];

  %C#2%
  [hiread$1 *+1 math sci ss write  *45];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex3.txt ;
  save is cprob;
  format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        263.13557          0.43856
  Class 2        336.86443          0.56144

#1. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1              258          0.43000
  Class 2              342          0.57000

#2. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
Average Class Probabilities by Class
                 1        2
  Class 1     0.962    0.038
  Class 2     0.044    0.956

#3. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
CLASS 1
 Means
    WRITE             44.944    0.695     64.645
    MATH              44.580    0.461     96.655
    SCI               44.084    0.729     60.473
    SS                45.392    0.629     72.158
 Variances
    WRITE             51.204    3.158     16.216
    MATH              47.212    3.055     15.453
    SCI               47.984    3.468     13.835
    SS                62.855    4.235     14.843
CLASS 2
 Means
    WRITE             58.197    0.495    117.599
    MATH              57.527    0.624     92.239
    SCI               57.762    0.464    124.497
    SS                57.243    0.582     98.361
 Variances
    WRITE             51.204    3.158     16.216
    MATH              47.212    3.055     15.453
    SCI               47.984    3.468     13.835
    SS                62.855    4.235     14.843

#4. This shows the average on the scores for the two classes for the continuous variables. Class 1 is a low performing group, and class 2 is a high performing group.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART
 Class 1
 Thresholds
    HIREAD$1           2.276    0.381      5.970
 Class 2
 Thresholds
    HIREAD$1          -1.835    0.215     -8.523

#5. For categorical variables, we do not estimate means but instead we estimate thresholds. I imagine this like a logistic regression predicting being a "bad reader" (0 on highread) for each class. So, for class 1, we have an empty model (no predictors), and the threshold (cut point) is 2.276. We can exponentiate this to convert it into an odds, exp(2.276) = 9.7376518, so if you are in class 1, the odds (not odds ratio) is almost 10 to 1 that you will be a "bad reader". We can convert this into a probability like this, exp(2.276) / (1 + exp(2.276))
.90686977, so if you are in class 1, there is a .9 probability you will be a bad reader.

For class 2, we do the same tricks. The odds of being a bad reader in class 2 is exp(-1.835) = .1596135 and the probability of being a bad reader in class 2 is exp(-1.835) / (1 + exp(-1.835)) = .13764371. Note that Mplus shows this to us in section 7 of the output below.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
 Means
    C#1               -0.247    0.126     -1.961

#6. This is the threshold for dividing the two classes. If you are below the threshold, you are class 1, above it and you are class 2. We see the threshold is -0.247. Say that we then convert this threshold to a probability like this.

Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.247)) = .43856204 (compare to section 1 above).
Prob(class 2) = 1 – 1/(1 + exp(-threshold1)) = 1 – 1 / ( 1 + exp( 0.247)) = .56143796 (compare to section 1 above).

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
 Class 1
 HIREAD
    Category 1         0.907    0.032     28.161
    Category 2         0.093    0.032      2.893
 Class 2
 HIREAD
    Category 1         0.138    0.026      5.387
    Category 2         0.862    0.026     33.744

# 7. These take the thresholds from section 5 of the output and convert them into probabilities for your convenience. Section 5 shows how you could manually convert the thresholds from that section into the probabilities shown here.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile hiread write math sci ss cprob1 cprob2 class using lca_ex3.txt

Below we show some observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.

. list in 200/210

     +---------------------------------------------------------------+
     | hiread   write   math    sci     ss   cprob1   cprob2   class |
     |---------------------------------------------------------------|
200. |      0    52.1   42.5   47.7   60.5     .954     .046       1 |
201. |      0    51.5     57   49.8   40.6     .914     .086       1 |
202. |      0    52.8   49.3   53.1   35.6     .984     .016       1 |
203. |      0    43.7   41.9   41.7   35.6        1        0       1 |
204. |      0    61.9     53   52.6   60.5     .022     .978       2 |
     |---------------------------------------------------------------|
205. |      0    41.1   45.3   47.1   55.6     .998     .002       1 |
206. |      0    38.5   47.1   41.7   25.7        1        0       1 |
207. |      0    54.1   46.4   49.8   55.6     .855     .145       1 |
208. |      0    51.5   48.5   49.8   50.6     .943     .057       1 |
209. |      0    41.1   53.6   41.7   55.6     .996     .004       1 |
     |---------------------------------------------------------------|
210. |      0    61.9   46.2   60.7   45.6     .196     .804       2 |
     +---------------------------------------------------------------+

Say that we get the mean of the writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2. Note the correspondence between these means and the means from section 4 of the output.

. tabstat write math sci ss [aw=cprob1], stat(mean) 

   stats |     write      math       sci        ss
---------+----------------------------------------
    mean |  44.94414  44.57938  44.08309  45.39155
--------------------------------------------------

. tabstat write math sci ss [aw=cprob2], stat(mean) 

   stats |     write      math       sci        ss
---------+----------------------------------------
    mean |  58.19674  57.52728  57.76235  57.24318
--------------------------------------------------

Another way to view this is to do a regression predicting, say, write and estimating the intercept and weighting the cases as we have done above, for example.

. regress write [aw=cprob1]
------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   44.94414   .3987896   112.70   0.000     44.16027    45.72802
------------------------------------------------------------------------------

. regress write [aw=cprob2]
------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   58.19674   .2886021   201.65   0.000     57.62963    58.76385
------------------------------------------------------------------------------

We can do the same kind of analysis predicting loread , weighting the cases by the probability of being in class 1 and the probability of being in class 2, as shown below. You can relate the coefficients here to the coefficients in section 5 of the Mplus output.

. logit loread [aw=cprob1]
------------------------------------------------------------------------------
      loread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   2.276237   .1679188    13.56   0.000     1.947122    2.605352
------------------------------------------------------------------------------

. logit loread [aw=cprob2]
------------------------------------------------------------------------------
      loread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -1.834929   .1337386   -13.72   0.000    -2.097052   -1.572806
------------------------------------------------------------------------------

Example 4: A latent class analysis with 2 classes, and continuous indicators, and one 3 level indicator.

Here is the input file

Data:
  File is g:mplushsb6.dat ;
Variable:
  Names are 
   id gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     read write math sci ss ses;
  categorical = ses;
  classes = c(2);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [read math sci ss write  *30 ses$1 *-1 ses$2 *1];

  %C#2%
  [read math sci ss write  *45 ses$1 *-1 ses$2 *1];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex4.txt ;
  save is cprob;
  format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        274.88163          0.45814
  Class 2        325.11837          0.54186

#1. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions
  Class 1              271          0.45167
  Class 2              329          0.54833

#2. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
Average Class Probabilities by Class
                 1        2
  Class 1     0.958    0.042
  Class 2     0.046    0.954

#3. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
MODEL RESULTS

                   Estimates     S.E.  Est./S.E.
CLASS 1
 Means
    READ              43.837    0.697     62.911
    WRITE             45.065    0.766     58.813
    MATH              44.800    0.494     90.743
    SCI               44.477    0.791     56.221
    SS                45.669    0.721     63.303
 Variances
    READ              46.915    2.816     16.659
    WRITE             49.141    3.039     16.171
    MATH              46.484    3.244     14.329
    SCI               49.167    3.431     14.329
    SS                63.054    4.209     14.981
CLASS 2
 Means
    READ              58.720    0.643     91.283
    WRITE             58.574    0.535    109.565
    MATH              57.808    0.737     78.444
    SCI               57.924    0.525    110.367
    SS                57.437    0.594     96.667
 Variances
    READ              46.915    2.816     16.659
    WRITE             49.141    3.039     16.171
    MATH              46.484    3.244     14.329
    SCI               49.167    3.431     14.329
    SS                63.054    4.209     14.981

#4. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART

 Class 1

 Thresholds
    SES$1             -0.553    0.134     -4.131
    SES$2              1.684    0.185      9.123

 Class 2

 Thresholds
    SES$1             -2.005    0.202     -9.914
    SES$2              0.550    0.124      4.428

#5. For categorical variables, we do not estimate means but instead we estimate thresholds. In the prior example we imagined this to be like a logistic regression, but this is a 3 level ordinal variable so we would not think of the thresholds (or cut points) like we do with logistic regression.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART

 Means
    C#1               -0.168    0.145     -1.160

#6. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE

 Class 1

 SES
    Category 1         0.365    0.031     11.777
    Category 2         0.478    0.031     15.263
    Category 3         0.157    0.024      6.424

 Class 2

 SES
    Category 1         0.119    0.021      5.612
    Category 2         0.515    0.029     17.997
    Category 3         0.366    0.029     12.699

#6. This takes the thresholds from section 5 of the output and converts them into probabilities. So, if you are in class 1, your probability of being low SES (category 1) is .365, but if you are in class 2, your probability of being low SES (category 1) is .119.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile ses read write math sci ss cprob1 cprob2 class using lca_ex4.txt

Below we show observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability. Note that we don’t see any folks in class 3 here, but there are members of class 3.

. list in 200/210

     +-------------------------------------------------------------------+
     | ses   read   write   math    sci     ss   cprob1   cprob2   class |
     |-------------------------------------------------------------------|
200. |   2   46.9    52.1   42.5   47.7   60.5     .886     .114       1 |
201. |   0   46.9    51.5     57   49.8   40.6     .963     .037       1 |
202. |   2   46.9    52.8   49.3   53.1   35.6     .958     .042       1 |
203. |   1   46.9    43.7   41.9   41.7   35.6        1        0       1 |
204. |   0   46.9    61.9     53   52.6   60.5      .05      .95       2 |
     |-------------------------------------------------------------------|
205. |   1   46.9    41.1   45.3   47.1   55.6     .998     .002       1 |
206. |   0   46.9    38.5   47.1   41.7   25.7        1        0       1 |
207. |   0   46.9    54.1   46.4   49.8   55.6     .938     .062       1 |
208. |   1   46.9    51.5   48.5   49.8   50.6      .93      .07       1 |
209. |   2   46.9    41.1   53.6   41.7   55.6     .989     .011       1 |
     |-------------------------------------------------------------------|
210. |   0   46.9    61.9   46.2   60.7   45.6     .381     .619       2 |
     +-------------------------------------------------------------------+

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3. Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |   43.8374  45.06437  44.80045  44.47711  45.66875
------------------------------------------------------------

. tabstat read write math sci ss [aw=cprob2], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |   58.7205  58.57446  57.80871  57.92401   57.4375
------------------------------------------------------------

Likewise, consider these ologit commands predicting ses using an empty model, but weighting the cases according to their class membership (class 1 then class 2). Note the correspondence between the cut points below and the thresholds in section 5 of the output.

. ologit ses [aw=cprob1]
------------------------------------------------------------------------------
         ses |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
       _cut1 |  -.5526053   .1013389          (Ancillary parameters)
       _cut2 |   1.683901   .1342721 
------------------------------------------------------------------------------

. ologit ses [aw=cprob2]

------------------------------------------------------------------------------
         ses |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
       _cut1 |  -2.004701   .1460688          (Ancillary parameters)
       _cut2 |   .5498481   .0980846 
------------------------------------------------------------------------------