These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.
This picks up after Examples 1 and 2, but considers models with only categorical indicators.
We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .
Example 7: A latent class analysis with 2 classes, all categorical (binary) indicators.
Here is the input file
Data:
File is h:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
hiread hiwrite himath hisci hiss ;
categorical = hiread hiwrite himath hisci hiss;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1 *2 ];
%C#2%
[hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex7.txt ;
save is cprob;
format is free;
Much of the output is similar to what you have seen in Example #1. Here is a first section of the output about the class counts and probabilities.
------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES
Class 1 268.38749 0.44731
Class 2 331.61251 0.55269
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP
Class Counts and Proportions
Class 1 266 0.44333
Class 2 334 0.55667
Average Class Probabilities by Class
1 2
Class 1 0.956 0.044
Class 2 0.043 0.957
#1. This section shows about the class counts and probabilities, much like you have seen in prior examples.
------------------------------------------------------------------------------
MODEL RESULTS
Estimates S.E. Est./S.E.
CLASS 1
CLASS 2
LATENT CLASS INDICATOR MODEL PART
Class 1
Thresholds
HIREAD$1 2.201 0.296 7.430
HIWRITE$1 1.434 0.181 7.901
HIMATH$1 1.888 0.224 8.436
HISCI$1 1.738 0.242 7.181
HISS$1 0.574 0.145 3.953
Class 2
Thresholds
HIREAD$1 -1.894 0.215 -8.811
HIWRITE$1 -1.525 0.182 -8.365
HIMATH$1 -1.339 0.169 -7.940
HISCI$1 -1.599 0.168 -9.507
HISS$1 -2.006 0.199 -10.054
#2. This is showing the thresholds of being a "high reader" for class 1 and class 2, and so forth for the other indicators. It can be tricky to understand the thresholds, so we can look at a later part of the output that shows the results in terms of probabilities.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1 -0.212 0.108 -1.954
#3. This part is like we have seen before.
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
Class 1
HIREAD
Category 1 0.900 0.027 33.877
Category 2 0.100 0.027 3.749
HIWRITE
Category 1 0.807 0.028 28.624
Category 2 0.193 0.028 6.824
HIMATH
Category 1 0.869 0.026 33.993
Category 2 0.131 0.026 5.143
HISCI
Category 1 0.850 0.031 27.621
Category 2 0.150 0.031 4.860
HISS
Category 1 0.640 0.033 19.118
Category 2 0.360 0.033 10.772
Class 2
HIREAD
Category 1 0.131 0.024 5.350
Category 2 0.869 0.024 35.574
HIWRITE
Category 1 0.179 0.027 6.680
Category 2 0.821 0.027 30.689
HIMATH
Category 1 0.208 0.028 7.486
Category 2 0.792 0.028 28.554
HISCI
Category 1 0.168 0.024 7.150
Category 2 0.832 0.024 35.363
HISS
Category 1 0.119 0.021 5.687
Category 2 0.881 0.021 42.262
#4. We have seen this kind of output in prior examples. This part shows us the conditional probabilities of being high (above the median) on each of these achievement measures by class membership. We can see that the first class is a poorly performing class. For example, given that you are in class 1, the probability of being a high reader (highread being 1 vs. 0) is .10 . By contrast, given that you are in class 2, the probability of being a high reader is .869.
Example 8: A latent class analysis with 3 classes, and categorical indicators.
Here is the input file
Data:
File is h:mplushsb6.dat ;
Variable:
Names are
gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
hiread hiwrite himath hisci hiss ;
categorical = hiread hiwrite himath hisci hiss;
classes = c(3);
Analysis:
Type=mixture;
MODEL:
%C#1%
[hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1 *2 ];
%C#2%
[hiread$1 *0 himath$1 *0 hisci$1 *0 hiss$1 *0 hiwrite$1 *0 ];
%C#3%
[hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex8.txt ;
save is cprob;
format is free;
Here is the output
------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES
Class 1 220.03953 0.36673
Class 2 197.52263 0.32920
Class 3 182.43784 0.30406
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP
Class Counts and Proportions
Class 1 229 0.38167
Class 2 175 0.29167
Class 3 196 0.32667
Average Class Probabilities by Class
1 2 3
Class 1 0.905 0.094 0.001
Class 2 0.072 0.818 0.110
Class 3 0.000 0.168 0.832
You have seen this kind of output before. It shows that there are about 37% in class 1, 33% in class 2, and about 30% in class 3.
------------------------------------------------------------------------------
IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET
AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000.
THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES:
* THRESHOLD 1 OF CLASS INDICATOR HIWRITE FOR CLASS C#3 AT ITERATION 65
We get this note in our output. If we look later in the output we see what this relates to, see below.
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
Class 1
HIREAD
Category 1 0.954 0.024 39.239
Category 2 0.046 0.024 1.893
HIWRITE
Category 1 0.831 0.038 21.940
Category 2 0.169 0.038 4.449
HIMATH
Category 1 0.900 0.029 31.321
Category 2 0.100 0.029 3.465
HISCI
Category 1 0.930 0.032 29.083
Category 2 0.070 0.032 2.203
HISS
Category 1 0.680 0.042 16.146
Category 2 0.320 0.042 7.600
Class 2
HIREAD
Category 1 0.342 0.068 5.005
Category 2 0.658 0.068 9.646
HIWRITE
Category 1 0.471 0.125 3.755
Category 2 0.529 0.125 4.216
HIMATH
Category 1 0.474 0.068 6.938
Category 2 0.526 0.068 7.708
HISCI
Category 1 0.282 0.064 4.375
Category 2 0.718 0.064 11.160
HISS
Category 1 0.259 0.065 3.971
Category 2 0.741 0.065 11.335
Class 3
HIREAD
Category 1 0.042 0.053 0.781
Category 2 0.958 0.053 17.940
HIWRITE
Category 1 0.000 0.000 0.000
Category 2 1.000 0.000 0.000
HIMATH
Category 1 0.057 0.071 0.791
Category 2 0.943 0.071 13.202
HISCI
Category 1 0.131 0.055 2.357
Category 2 0.869 0.055 15.686
HISS
Category 1 0.056 0.033 1.694
Category 2 0.944 0.033 28.795
Above we see the probabilities of being a "high reader" given you are in class 1, class 2, and class 3. If we take "high writer", the probabability of being a "high writer" given you are in class 1 is .169, given you are in class 2 is .529, and given you are in class 3 it is 1.0. We can relate this to what we get in the output file from Mplus, see below where we input the data into Stata.
. infile hiread hiwrite himath hisci hiss cprob1 cprob2 cprob3 class using lca_ex8.txt
Now let’s look at a tabulation of class membership by hiwrite. Given that you are in class 3, there are 196 high writers and 0 non-high writers, so the probability of being a high writer in class 3 is indeed 1.
. tab class hiwrite
| hiwrite
class | 0 1 | Total
-----------+----------------------+----------
1 | 190 39 | 229
2 | 86 89 | 175
3 | 0 196 | 196
-----------+----------------------+----------
Total | 276 324 | 600
