These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.
This picks up after Examples 1 and 2, but considers models with only categorical indicators.
We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .
Example 7: A latent class analysis with 2 classes, all categorical (binary) indicators.
Here is the input file
Data: File is h:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ; Variable: Names are id gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are hiread hiwrite himath hisci hiss ; categorical = hiread hiwrite himath hisci hiss; classes = c(2); Analysis: Type=mixture; MODEL: %C#1% [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1 *2 ]; %C#2% [hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ]; OUTPUT: TECH8; SAVEDATA: file is lca_ex7.txt ; save is cprob; format is free;
Much of the output is similar to what you have seen in Example #1. Here is a first section of the output about the class counts and probabilities.
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 268.38749 0.44731 Class 2 331.61251 0.55269 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 266 0.44333 Class 2 334 0.55667 Average Class Probabilities by Class 1 2 Class 1 0.956 0.044 Class 2 0.043 0.957
#1. This section shows about the class counts and probabilities, much like you have seen in prior examples.
------------------------------------------------------------------------------ MODEL RESULTS Estimates S.E. Est./S.E. CLASS 1 CLASS 2 LATENT CLASS INDICATOR MODEL PART Class 1 Thresholds HIREAD$1 2.201 0.296 7.430 HIWRITE$1 1.434 0.181 7.901 HIMATH$1 1.888 0.224 8.436 HISCI$1 1.738 0.242 7.181 HISS$1 0.574 0.145 3.953 Class 2 Thresholds HIREAD$1 -1.894 0.215 -8.811 HIWRITE$1 -1.525 0.182 -8.365 HIMATH$1 -1.339 0.169 -7.940 HISCI$1 -1.599 0.168 -9.507 HISS$1 -2.006 0.199 -10.054
#2. This is showing the thresholds of being a "high reader" for class 1 and class 2, and so forth for the other indicators. It can be tricky to understand the thresholds, so we can look at a later part of the output that shows the results in terms of probabilities.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART Means C#1 -0.212 0.108 -1.954
#3. This part is like we have seen before.
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE Class 1 HIREAD Category 1 0.900 0.027 33.877 Category 2 0.100 0.027 3.749 HIWRITE Category 1 0.807 0.028 28.624 Category 2 0.193 0.028 6.824 HIMATH Category 1 0.869 0.026 33.993 Category 2 0.131 0.026 5.143 HISCI Category 1 0.850 0.031 27.621 Category 2 0.150 0.031 4.860 HISS Category 1 0.640 0.033 19.118 Category 2 0.360 0.033 10.772 Class 2 HIREAD Category 1 0.131 0.024 5.350 Category 2 0.869 0.024 35.574 HIWRITE Category 1 0.179 0.027 6.680 Category 2 0.821 0.027 30.689 HIMATH Category 1 0.208 0.028 7.486 Category 2 0.792 0.028 28.554 HISCI Category 1 0.168 0.024 7.150 Category 2 0.832 0.024 35.363 HISS Category 1 0.119 0.021 5.687 Category 2 0.881 0.021 42.262
#4. We have seen this kind of output in prior examples. This part shows us the conditional probabilities of being high (above the median) on each of these achievement measures by class membership. We can see that the first class is a poorly performing class. For example, given that you are in class 1, the probability of being a high reader (highread being 1 vs. 0) is .10 . By contrast, given that you are in class 2, the probability of being a high reader is .869.
Example 8: A latent class analysis with 3 classes, and categorical indicators.
Here is the input file
Data: File is h:mplushsb6.dat ; Variable: Names are gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are hiread hiwrite himath hisci hiss ; categorical = hiread hiwrite himath hisci hiss; classes = c(3); Analysis: Type=mixture; MODEL: %C#1% [hiread$1 *2 himath$1 *2 hisci$1 *2 hiss$1 *2 hiwrite$1 *2 ]; %C#2% [hiread$1 *0 himath$1 *0 hisci$1 *0 hiss$1 *0 hiwrite$1 *0 ]; %C#3% [hiread$1 *-2 himath$1 *-2 hisci$1 *-2 hiss$1 *-2 hiwrite$1 *-2 ]; OUTPUT: TECH8; SAVEDATA: file is lca_ex8.txt ; save is cprob; format is free;
Here is the output
------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 220.03953 0.36673 Class 2 197.52263 0.32920 Class 3 182.43784 0.30406 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 229 0.38167 Class 2 175 0.29167 Class 3 196 0.32667 Average Class Probabilities by Class 1 2 3 Class 1 0.905 0.094 0.001 Class 2 0.072 0.818 0.110 Class 3 0.000 0.168 0.832
You have seen this kind of output before. It shows that there are about 37% in class 1, 33% in class 2, and about 30% in class 3.
------------------------------------------------------------------------------ IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000. THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES: * THRESHOLD 1 OF CLASS INDICATOR HIWRITE FOR CLASS C#3 AT ITERATION 65
We get this note in our output. If we look later in the output we see what this relates to, see below.
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE Class 1 HIREAD Category 1 0.954 0.024 39.239 Category 2 0.046 0.024 1.893 HIWRITE Category 1 0.831 0.038 21.940 Category 2 0.169 0.038 4.449 HIMATH Category 1 0.900 0.029 31.321 Category 2 0.100 0.029 3.465 HISCI Category 1 0.930 0.032 29.083 Category 2 0.070 0.032 2.203 HISS Category 1 0.680 0.042 16.146 Category 2 0.320 0.042 7.600 Class 2 HIREAD Category 1 0.342 0.068 5.005 Category 2 0.658 0.068 9.646 HIWRITE Category 1 0.471 0.125 3.755 Category 2 0.529 0.125 4.216 HIMATH Category 1 0.474 0.068 6.938 Category 2 0.526 0.068 7.708 HISCI Category 1 0.282 0.064 4.375 Category 2 0.718 0.064 11.160 HISS Category 1 0.259 0.065 3.971 Category 2 0.741 0.065 11.335 Class 3 HIREAD Category 1 0.042 0.053 0.781 Category 2 0.958 0.053 17.940 HIWRITE Category 1 0.000 0.000 0.000 Category 2 1.000 0.000 0.000 HIMATH Category 1 0.057 0.071 0.791 Category 2 0.943 0.071 13.202 HISCI Category 1 0.131 0.055 2.357 Category 2 0.869 0.055 15.686 HISS Category 1 0.056 0.033 1.694 Category 2 0.944 0.033 28.795
Above we see the probabilities of being a "high reader" given you are in class 1, class 2, and class 3. If we take "high writer", the probabability of being a "high writer" given you are in class 1 is .169, given you are in class 2 is .529, and given you are in class 3 it is 1.0. We can relate this to what we get in the output file from Mplus, see below where we input the data into Stata.
. infile hiread hiwrite himath hisci hiss cprob1 cprob2 cprob3 class using lca_ex8.txt
Now let’s look at a tabulation of class membership by hiwrite. Given that you are in class 3, there are 196 high writers and 0 non-high writers, so the probability of being a high writer in class 3 is indeed 1.
. tab class hiwrite | hiwrite class | 0 1 | Total -----------+----------------------+---------- 1 | 190 39 | 229 2 | 86 89 | 175 3 | 0 196 | 196 -----------+----------------------+---------- Total | 276 324 | 600