These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.
This picks up after Examples 1 and 2, but considers adding categorical variables to the model. Based on
We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .
Example 3: A latent class analysis with 2 classes, and continuous indicators, and a two level categorical variable.
Here is the input file
Data: File is I:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ; Variable: Names are id gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are hiread write math sci ss ; categorical = hiread; classes = c(2); Analysis: Type=mixture; MODEL: %C#1% [hiread$1 *-1 math sci ss write *30 ]; %C#2% [hiread$1 *+1 math sci ss write *45]; OUTPUT: TECH8; SAVEDATA: file is lca_ex3.txt ; save is cprob; format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 263.13557 0.43856 Class 2 336.86443 0.56144
#1. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 258 0.43000 Class 2 342 0.57000
#2. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ Average Class Probabilities by Class 1 2 Class 1 0.962 0.038 Class 2 0.044 0.956
#3. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ MODEL RESULTS Estimates S.E. Est./S.E. CLASS 1 Means WRITE 44.944 0.695 64.645 MATH 44.580 0.461 96.655 SCI 44.084 0.729 60.473 SS 45.392 0.629 72.158 Variances WRITE 51.204 3.158 16.216 MATH 47.212 3.055 15.453 SCI 47.984 3.468 13.835 SS 62.855 4.235 14.843 CLASS 2 Means WRITE 58.197 0.495 117.599 MATH 57.527 0.624 92.239 SCI 57.762 0.464 124.497 SS 57.243 0.582 98.361 Variances WRITE 51.204 3.158 16.216 MATH 47.212 3.055 15.453 SCI 47.984 3.468 13.835 SS 62.855 4.235 14.843
#4. This shows the average on the scores for the two classes for the continuous variables. Class 1 is a low performing group, and class 2 is a high performing group.
------------------------------------------------------------------------------ LATENT CLASS INDICATOR MODEL PART Class 1 Thresholds HIREAD$1 2.276 0.381 5.970 Class 2 Thresholds HIREAD$1 -1.835 0.215 -8.523
#5. For categorical variables, we do not estimate means but instead we estimate thresholds. I imagine this like a logistic regression predicting being a "bad reader" (0 on highread) for each class. So, for class 1, we have an empty model (no predictors), and the threshold (cut point) is 2.276. We can exponentiate this to convert it into an odds, exp(2.276) = 9.7376518, so if you are in class 1, the odds (not odds ratio) is almost 10 to 1 that you will be a "bad reader". We can convert this into a probability like this, exp(2.276) / (1 + exp(2.276))
.90686977, so if you are in class 1, there is a .9 probability you will be a bad reader.For class 2, we do the same tricks. The odds of being a bad reader in class 2 is exp(-1.835) = .1596135 and the probability of being a bad reader in class 2 is exp(-1.835) / (1 + exp(-1.835)) = .13764371. Note that Mplus shows this to us in section 7 of the output below.
------------------------------------------------------------------------------ LATENT CLASS REGRESSION MODEL PART Means C#1 -0.247 0.126 -1.961
#6. This is the threshold for dividing the two classes. If you are below the threshold, you are class 1, above it and you are class 2. We see the threshold is -0.247. Say that we then convert this threshold to a probability like this.
Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.247)) = .43856204 (compare to section 1 above).
Prob(class 2) = 1 – 1/(1 + exp(-threshold1)) = 1 – 1 / ( 1 + exp( 0.247)) = .56143796 (compare to section 1 above).
------------------------------------------------------------------------------ LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE Class 1 HIREAD Category 1 0.907 0.032 28.161 Category 2 0.093 0.032 2.893 Class 2 HIREAD Category 1 0.138 0.026 5.387 Category 2 0.862 0.026 33.744
# 7. These take the thresholds from section 5 of the output and convert them into probabilities for your convenience. Section 5 shows how you could manually convert the thresholds from that section into the probabilities shown here.
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile hiread write math sci ss cprob1 cprob2 class using lca_ex3.txt
Below we show some observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.
. list in 200/210 +---------------------------------------------------------------+ | hiread write math sci ss cprob1 cprob2 class | |---------------------------------------------------------------| 200. | 0 52.1 42.5 47.7 60.5 .954 .046 1 | 201. | 0 51.5 57 49.8 40.6 .914 .086 1 | 202. | 0 52.8 49.3 53.1 35.6 .984 .016 1 | 203. | 0 43.7 41.9 41.7 35.6 1 0 1 | 204. | 0 61.9 53 52.6 60.5 .022 .978 2 | |---------------------------------------------------------------| 205. | 0 41.1 45.3 47.1 55.6 .998 .002 1 | 206. | 0 38.5 47.1 41.7 25.7 1 0 1 | 207. | 0 54.1 46.4 49.8 55.6 .855 .145 1 | 208. | 0 51.5 48.5 49.8 50.6 .943 .057 1 | 209. | 0 41.1 53.6 41.7 55.6 .996 .004 1 | |---------------------------------------------------------------| 210. | 0 61.9 46.2 60.7 45.6 .196 .804 2 | +---------------------------------------------------------------+
Say that we get the mean of the writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2. Note the correspondence between these means and the means from section 4 of the output.
. tabstat write math sci ss [aw=cprob1], stat(mean) stats | write math sci ss ---------+---------------------------------------- mean | 44.94414 44.57938 44.08309 45.39155 -------------------------------------------------- . tabstat write math sci ss [aw=cprob2], stat(mean) stats | write math sci ss ---------+---------------------------------------- mean | 58.19674 57.52728 57.76235 57.24318 --------------------------------------------------
Another way to view this is to do a regression predicting, say, write and estimating the intercept and weighting the cases as we have done above, for example.
. regress write [aw=cprob1] ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 44.94414 .3987896 112.70 0.000 44.16027 45.72802 ------------------------------------------------------------------------------ . regress write [aw=cprob2] ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 58.19674 .2886021 201.65 0.000 57.62963 58.76385 ------------------------------------------------------------------------------
We can do the same kind of analysis predicting loread , weighting the cases by the probability of being in class 1 and the probability of being in class 2, as shown below. You can relate the coefficients here to the coefficients in section 5 of the Mplus output.
. logit loread [aw=cprob1] ------------------------------------------------------------------------------ loread | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 2.276237 .1679188 13.56 0.000 1.947122 2.605352 ------------------------------------------------------------------------------ . logit loread [aw=cprob2] ------------------------------------------------------------------------------ loread | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | -1.834929 .1337386 -13.72 0.000 -2.097052 -1.572806 ------------------------------------------------------------------------------
Example 4: A latent class analysis with 2 classes, and continuous indicators, and one 3 level indicator.
Here is the input file
Data: File is g:mplushsb6.dat ; Variable: Names are id gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are read write math sci ss ses; categorical = ses; classes = c(2); Analysis: Type=mixture; MODEL: %C#1% [read math sci ss write *30 ses$1 *-1 ses$2 *1]; %C#2% [read math sci ss write *45 ses$1 *-1 ses$2 *1]; OUTPUT: TECH8; SAVEDATA: file is lca_ex4.txt ; save is cprob; format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 274.88163 0.45814 Class 2 325.11837 0.54186
#1. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 271 0.45167 Class 2 329 0.54833
#2. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ Average Class Probabilities by Class 1 2 Class 1 0.958 0.042 Class 2 0.046 0.954
#3. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ MODEL RESULTS Estimates S.E. Est./S.E. CLASS 1 Means READ 43.837 0.697 62.911 WRITE 45.065 0.766 58.813 MATH 44.800 0.494 90.743 SCI 44.477 0.791 56.221 SS 45.669 0.721 63.303 Variances READ 46.915 2.816 16.659 WRITE 49.141 3.039 16.171 MATH 46.484 3.244 14.329 SCI 49.167 3.431 14.329 SS 63.054 4.209 14.981 CLASS 2 Means READ 58.720 0.643 91.283 WRITE 58.574 0.535 109.565 MATH 57.808 0.737 78.444 SCI 57.924 0.525 110.367 SS 57.437 0.594 96.667 Variances READ 46.915 2.816 16.659 WRITE 49.141 3.039 16.171 MATH 46.484 3.244 14.329 SCI 49.167 3.431 14.329 SS 63.054 4.209 14.981
#4. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ LATENT CLASS INDICATOR MODEL PART Class 1 Thresholds SES$1 -0.553 0.134 -4.131 SES$2 1.684 0.185 9.123 Class 2 Thresholds SES$1 -2.005 0.202 -9.914 SES$2 0.550 0.124 4.428
#5. For categorical variables, we do not estimate means but instead we estimate thresholds. In the prior example we imagined this to be like a logistic regression, but this is a 3 level ordinal variable so we would not think of the thresholds (or cut points) like we do with logistic regression.
------------------------------------------------------------------------------ LATENT CLASS REGRESSION MODEL PART Means C#1 -0.168 0.145 -1.160
#6. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE Class 1 SES Category 1 0.365 0.031 11.777 Category 2 0.478 0.031 15.263 Category 3 0.157 0.024 6.424 Class 2 SES Category 1 0.119 0.021 5.612 Category 2 0.515 0.029 17.997 Category 3 0.366 0.029 12.699
#6. This takes the thresholds from section 5 of the output and converts them into probabilities. So, if you are in class 1, your probability of being low SES (category 1) is .365, but if you are in class 2, your probability of being low SES (category 1) is .119.
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile ses read write math sci ss cprob1 cprob2 class using lca_ex4.txt
Below we show observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability. Note that we don’t see any folks in class 3 here, but there are members of class 3.
. list in 200/210 +-------------------------------------------------------------------+ | ses read write math sci ss cprob1 cprob2 class | |-------------------------------------------------------------------| 200. | 2 46.9 52.1 42.5 47.7 60.5 .886 .114 1 | 201. | 0 46.9 51.5 57 49.8 40.6 .963 .037 1 | 202. | 2 46.9 52.8 49.3 53.1 35.6 .958 .042 1 | 203. | 1 46.9 43.7 41.9 41.7 35.6 1 0 1 | 204. | 0 46.9 61.9 53 52.6 60.5 .05 .95 2 | |-------------------------------------------------------------------| 205. | 1 46.9 41.1 45.3 47.1 55.6 .998 .002 1 | 206. | 0 46.9 38.5 47.1 41.7 25.7 1 0 1 | 207. | 0 46.9 54.1 46.4 49.8 55.6 .938 .062 1 | 208. | 1 46.9 51.5 48.5 49.8 50.6 .93 .07 1 | 209. | 2 46.9 41.1 53.6 41.7 55.6 .989 .011 1 | |-------------------------------------------------------------------| 210. | 0 46.9 61.9 46.2 60.7 45.6 .381 .619 2 | +-------------------------------------------------------------------+
Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3. Note the correspondence between these means and the means from section 4 of the output.
. tabstat read write math sci ss [aw=cprob1], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 43.8374 45.06437 44.80045 44.47711 45.66875 ------------------------------------------------------------ . tabstat read write math sci ss [aw=cprob2], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 58.7205 58.57446 57.80871 57.92401 57.4375 ------------------------------------------------------------
Likewise, consider these ologit commands predicting ses using an empty model, but weighting the cases according to their class membership (class 1 then class 2). Note the correspondence between the cut points below and the thresholds in section 5 of the output.
. ologit ses [aw=cprob1] ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- -------------+---------------------------------------------------------------- _cut1 | -.5526053 .1013389 (Ancillary parameters) _cut2 | 1.683901 .1342721 ------------------------------------------------------------------------------ . ologit ses [aw=cprob2] ------------------------------------------------------------------------------ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- -------------+---------------------------------------------------------------- _cut1 | -2.004701 .1460688 (Ancillary parameters) _cut2 | .5498481 .0980846 ------------------------------------------------------------------------------