A Latent Class Example, Examples 7 and 8

These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.

This picks up after Examples 1 and 2, but considers models with only categorical indicators.

We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .

Example 7: A latent class analysis with 2 classes, all categorical (binary) indicators.

Here is the input file

Data:
  File is h:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ;
Variable:
  Names are 
   id gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     hiread hiwrite himath hisci hiss ;
  categorical = hiread hiwrite himath hisci hiss;
  classes = c(2);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];
  %C#2%
  [hiread$1  *-2 himath$1  *-2 hisci$1  *-2 hiss$1  *-2 hiwrite$1  *-2 ];

OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex7.txt ;
  save is cprob;
  format is free;

Much of the output is similar to what you have seen in Example #1. Here is a first section of the output about the class counts and probabilities.

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        268.38749          0.44731
  Class 2        331.61251          0.55269


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1              266          0.44333
  Class 2              334          0.55667


Average Class Probabilities by Class

                 1        2

  Class 1     0.956    0.044
  Class 2     0.043    0.957

#1. This section shows about the class counts and probabilities, much like you have seen in prior examples.

------------------------------------------------------------------------------
MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
CLASS 1
CLASS 2

LATENT CLASS INDICATOR MODEL PART
 Class 1
 Thresholds
    HIREAD$1           2.201    0.296      7.430
    HIWRITE$1          1.434    0.181      7.901
    HIMATH$1           1.888    0.224      8.436
    HISCI$1            1.738    0.242      7.181
    HISS$1             0.574    0.145      3.953
 Class 2
 Thresholds
    HIREAD$1          -1.894    0.215     -8.811
    HIWRITE$1         -1.525    0.182     -8.365
    HIMATH$1          -1.339    0.169     -7.940
    HISCI$1           -1.599    0.168     -9.507
    HISS$1            -2.006    0.199    -10.054

#2. This is showing the thresholds of being a "high reader" for class 1 and class 2, and so forth for the other indicators. It can be tricky to understand the thresholds, so we can look at a later part of the output that shows the results in terms of probabilities.

------------------------------------------------------------------------------

LATENT CLASS REGRESSION MODEL PART

 Means
    C#1               -0.212    0.108     -1.954

#3. This part is like we have seen before.

------------------------------------------------------------------------------

LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE

 Class 1

 HIREAD
    Category 1         0.900    0.027     33.877
    Category 2         0.100    0.027      3.749
 HIWRITE
    Category 1         0.807    0.028     28.624
    Category 2         0.193    0.028      6.824
 HIMATH
    Category 1         0.869    0.026     33.993
    Category 2         0.131    0.026      5.143
 HISCI
    Category 1         0.850    0.031     27.621
    Category 2         0.150    0.031      4.860
 HISS
    Category 1         0.640    0.033     19.118
    Category 2         0.360    0.033     10.772

 Class 2

 HIREAD
    Category 1         0.131    0.024      5.350
    Category 2         0.869    0.024     35.574
 HIWRITE
    Category 1         0.179    0.027      6.680
    Category 2         0.821    0.027     30.689
 HIMATH
    Category 1         0.208    0.028      7.486
    Category 2         0.792    0.028     28.554
 HISCI
    Category 1         0.168    0.024      7.150
    Category 2         0.832    0.024     35.363
 HISS
    Category 1         0.119    0.021      5.687
    Category 2         0.881    0.021     42.262

#4. We have seen this kind of output in prior examples. This part shows us the conditional probabilities of being high (above the median) on each of these achievement measures by class membership. We can see that the first class is a poorly performing class. For example, given that you are in class 1, the probability of being a high reader (highread being 1 vs. 0) is .10 . By contrast, given that you are in class 2, the probability of being a high reader is .869.

Example 8: A latent class analysis with 3 classes, and categorical indicators.

Here is the input file

Data:
  File is h:mplushsb6.dat ;
Variable:
  Names are 
   gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     hiread hiwrite himath hisci hiss ;
  categorical = hiread hiwrite himath hisci hiss;
  classes = c(3);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1  *2 ];
  %C#2%
  [hiread$1  *0 himath$1  *0 hisci$1  *0 hiss$1  *0 hiwrite$1  *0 ];
  %C#3%
  [hiread$1  *-2 himath$1  *-2 hisci$1  *-2 hiss$1  *-2 hiwrite$1  *-2 ];

OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex8.txt ;
  save is cprob;
  format is free;

Here is the output

------------------------------------------------------------------------------

FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        220.03953          0.36673
  Class 2        197.52263          0.32920
  Class 3        182.43784          0.30406


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

  Class 1              229          0.38167
  Class 2              175          0.29167
  Class 3              196          0.32667


Average Class Probabilities by Class

                 1        2        3

  Class 1     0.905    0.094    0.001
  Class 2     0.072    0.818    0.110
  Class 3     0.000    0.168    0.832

You have seen this kind of output before. It shows that there are about 37% in class 1, 33% in class 2, and about 30% in class 3.

------------------------------------------------------------------------------
     IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET
     AT THE EXTREME VALUES.  EXTREME VALUES ARE -15.000 AND 15.000.
     THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES:
     * THRESHOLD 1 OF CLASS INDICATOR HIWRITE FOR CLASS C#3 AT ITERATION 65

We get this note in our output. If we look later in the output we see what this relates to, see below.

LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
 Class 1
 HIREAD
    Category 1         0.954    0.024     39.239
    Category 2         0.046    0.024      1.893
 HIWRITE
    Category 1         0.831    0.038     21.940
    Category 2         0.169    0.038      4.449
 HIMATH
    Category 1         0.900    0.029     31.321
    Category 2         0.100    0.029      3.465
 HISCI
    Category 1         0.930    0.032     29.083
    Category 2         0.070    0.032      2.203
 HISS
    Category 1         0.680    0.042     16.146
    Category 2         0.320    0.042      7.600

 Class 2
 HIREAD
    Category 1         0.342    0.068      5.005
    Category 2         0.658    0.068      9.646
 HIWRITE
    Category 1         0.471    0.125      3.755
    Category 2         0.529    0.125      4.216
 HIMATH
    Category 1         0.474    0.068      6.938
    Category 2         0.526    0.068      7.708
 HISCI
    Category 1         0.282    0.064      4.375
    Category 2         0.718    0.064     11.160
 HISS
    Category 1         0.259    0.065      3.971
    Category 2         0.741    0.065     11.335

 Class 3
 HIREAD
    Category 1         0.042    0.053      0.781
    Category 2         0.958    0.053     17.940
 HIWRITE
    Category 1         0.000    0.000      0.000
    Category 2         1.000    0.000      0.000
 HIMATH
    Category 1         0.057    0.071      0.791
    Category 2         0.943    0.071     13.202
 HISCI
    Category 1         0.131    0.055      2.357
    Category 2         0.869    0.055     15.686
 HISS
    Category 1         0.056    0.033      1.694
    Category 2         0.944    0.033     28.795

Above we see the probabilities of being a "high reader" given you are in class 1, class 2, and class 3. If we take "high writer", the probabability of being a "high writer" given you are in class 1 is .169, given you are in class 2 is .529, and given you are in class 3 it is 1.0. We can relate this to what we get in the output file from Mplus, see below where we input the data into Stata.

. infile hiread hiwrite himath hisci hiss cprob1 cprob2 cprob3 class using lca_ex8.txt

Now let’s look at a tabulation of class membership by hiwrite. Given that you are in class 3, there are 196 high writers and 0 non-high writers, so the probability of being a high writer in class 3 is indeed 1.

. tab class hiwrite

           |        hiwrite
     class |         0          1 |     Total
-----------+----------------------+----------
         1 |       190         39 |       229 
         2 |        86         89 |       175 
         3 |         0        196 |       196 
-----------+----------------------+----------
     Total |       276        324 |       600