These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.
This picks up after Examples 1 and 2, but considers adding categorical variables to the model. Based on
We now have the input file //mplus/code/hsb6.inp and the data file it reads called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat .
Example 3: A latent class analysis with 2 classes, and continuous indicators, and a two level categorical variable.
Here is the input file
Data:
File is I:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
hiread write math sci ss ;
categorical = hiread;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[hiread$1 *-1 math sci ss write *30 ];
%C#2%
[hiread$1 *+1 math sci ss write *45];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex3.txt ;
save is cprob;
format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 263.13557 0.43856 Class 2 336.86443 0.56144
#1. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 258 0.43000 Class 2 342 0.57000
#2. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
Average Class Probabilities by Class
1 2
Class 1 0.962 0.038
Class 2 0.044 0.956
#3. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
MODEL RESULTS
Estimates S.E. Est./S.E.
CLASS 1
Means
WRITE 44.944 0.695 64.645
MATH 44.580 0.461 96.655
SCI 44.084 0.729 60.473
SS 45.392 0.629 72.158
Variances
WRITE 51.204 3.158 16.216
MATH 47.212 3.055 15.453
SCI 47.984 3.468 13.835
SS 62.855 4.235 14.843
CLASS 2
Means
WRITE 58.197 0.495 117.599
MATH 57.527 0.624 92.239
SCI 57.762 0.464 124.497
SS 57.243 0.582 98.361
Variances
WRITE 51.204 3.158 16.216
MATH 47.212 3.055 15.453
SCI 47.984 3.468 13.835
SS 62.855 4.235 14.843
#4. This shows the average on the scores for the two classes for the continuous variables. Class 1 is a low performing group, and class 2 is a high performing group.
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART
Class 1
Thresholds
HIREAD$1 2.276 0.381 5.970
Class 2
Thresholds
HIREAD$1 -1.835 0.215 -8.523
#5. For categorical variables, we do not estimate means but instead we estimate thresholds. I imagine this like a logistic regression predicting being a "bad reader" (0 on highread) for each class. So, for class 1, we have an empty model (no predictors), and the threshold (cut point) is 2.276. We can exponentiate this to convert it into an odds, exp(2.276) = 9.7376518, so if you are in class 1, the odds (not odds ratio) is almost 10 to 1 that you will be a "bad reader". We can convert this into a probability like this, exp(2.276) / (1 + exp(2.276))
.90686977, so if you are in class 1, there is a .9 probability you will be a bad reader.For class 2, we do the same tricks. The odds of being a bad reader in class 2 is exp(-1.835) = .1596135 and the probability of being a bad reader in class 2 is exp(-1.835) / (1 + exp(-1.835)) = .13764371. Note that Mplus shows this to us in section 7 of the output below.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1 -0.247 0.126 -1.961
#6. This is the threshold for dividing the two classes. If you are below the threshold, you are class 1, above it and you are class 2. We see the threshold is -0.247. Say that we then convert this threshold to a probability like this.
Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.247)) = .43856204 (compare to section 1 above).
Prob(class 2) = 1 – 1/(1 + exp(-threshold1)) = 1 – 1 / ( 1 + exp( 0.247)) = .56143796 (compare to section 1 above).
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
Class 1
HIREAD
Category 1 0.907 0.032 28.161
Category 2 0.093 0.032 2.893
Class 2
HIREAD
Category 1 0.138 0.026 5.387
Category 2 0.862 0.026 33.744
# 7. These take the thresholds from section 5 of the output and convert them into probabilities for your convenience. Section 5 shows how you could manually convert the thresholds from that section into the probabilities shown here.
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile hiread write math sci ss cprob1 cprob2 class using lca_ex3.txt
Below we show some observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.
. list in 200/210
+---------------------------------------------------------------+
| hiread write math sci ss cprob1 cprob2 class |
|---------------------------------------------------------------|
200. | 0 52.1 42.5 47.7 60.5 .954 .046 1 |
201. | 0 51.5 57 49.8 40.6 .914 .086 1 |
202. | 0 52.8 49.3 53.1 35.6 .984 .016 1 |
203. | 0 43.7 41.9 41.7 35.6 1 0 1 |
204. | 0 61.9 53 52.6 60.5 .022 .978 2 |
|---------------------------------------------------------------|
205. | 0 41.1 45.3 47.1 55.6 .998 .002 1 |
206. | 0 38.5 47.1 41.7 25.7 1 0 1 |
207. | 0 54.1 46.4 49.8 55.6 .855 .145 1 |
208. | 0 51.5 48.5 49.8 50.6 .943 .057 1 |
209. | 0 41.1 53.6 41.7 55.6 .996 .004 1 |
|---------------------------------------------------------------|
210. | 0 61.9 46.2 60.7 45.6 .196 .804 2 |
+---------------------------------------------------------------+
Say that we get the mean of the writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2. Note the correspondence between these means and the means from section 4 of the output.
. tabstat write math sci ss [aw=cprob1], stat(mean)
stats | write math sci ss
---------+----------------------------------------
mean | 44.94414 44.57938 44.08309 45.39155
--------------------------------------------------
. tabstat write math sci ss [aw=cprob2], stat(mean)
stats | write math sci ss
---------+----------------------------------------
mean | 58.19674 57.52728 57.76235 57.24318
--------------------------------------------------
Another way to view this is to do a regression predicting, say, write and estimating the intercept and weighting the cases as we have done above, for example.
. regress write [aw=cprob1]
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 44.94414 .3987896 112.70 0.000 44.16027 45.72802
------------------------------------------------------------------------------
. regress write [aw=cprob2]
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 58.19674 .2886021 201.65 0.000 57.62963 58.76385
------------------------------------------------------------------------------
We can do the same kind of analysis predicting loread , weighting the cases by the probability of being in class 1 and the probability of being in class 2, as shown below. You can relate the coefficients here to the coefficients in section 5 of the Mplus output.
. logit loread [aw=cprob1]
------------------------------------------------------------------------------
loread | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 2.276237 .1679188 13.56 0.000 1.947122 2.605352
------------------------------------------------------------------------------
. logit loread [aw=cprob2]
------------------------------------------------------------------------------
loread | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -1.834929 .1337386 -13.72 0.000 -2.097052 -1.572806
------------------------------------------------------------------------------
Example 4: A latent class analysis with 2 classes, and continuous indicators, and one 3 level indicator.
Here is the input file
Data:
File is g:mplushsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
read write math sci ss ses;
categorical = ses;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[read math sci ss write *30 ses$1 *-1 ses$2 *1];
%C#2%
[read math sci ss write *45 ses$1 *-1 ses$2 *1];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex4.txt ;
save is cprob;
format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 274.88163 0.45814 Class 2 325.11837 0.54186
#1. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 271 0.45167 Class 2 329 0.54833
#2. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
Average Class Probabilities by Class
1 2
Class 1 0.958 0.042
Class 2 0.046 0.954
#3. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
MODEL RESULTS
Estimates S.E. Est./S.E.
CLASS 1
Means
READ 43.837 0.697 62.911
WRITE 45.065 0.766 58.813
MATH 44.800 0.494 90.743
SCI 44.477 0.791 56.221
SS 45.669 0.721 63.303
Variances
READ 46.915 2.816 16.659
WRITE 49.141 3.039 16.171
MATH 46.484 3.244 14.329
SCI 49.167 3.431 14.329
SS 63.054 4.209 14.981
CLASS 2
Means
READ 58.720 0.643 91.283
WRITE 58.574 0.535 109.565
MATH 57.808 0.737 78.444
SCI 57.924 0.525 110.367
SS 57.437 0.594 96.667
Variances
READ 46.915 2.816 16.659
WRITE 49.141 3.039 16.171
MATH 46.484 3.244 14.329
SCI 49.167 3.431 14.329
SS 63.054 4.209 14.981
#4. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART
Class 1
Thresholds
SES$1 -0.553 0.134 -4.131
SES$2 1.684 0.185 9.123
Class 2
Thresholds
SES$1 -2.005 0.202 -9.914
SES$2 0.550 0.124 4.428
#5. For categorical variables, we do not estimate means but instead we estimate thresholds. In the prior example we imagined this to be like a logistic regression, but this is a 3 level ordinal variable so we would not think of the thresholds (or cut points) like we do with logistic regression.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1 -0.168 0.145 -1.160
#6. These are much the same as with Example #1, see that example for more details.
------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
Class 1
SES
Category 1 0.365 0.031 11.777
Category 2 0.478 0.031 15.263
Category 3 0.157 0.024 6.424
Class 2
SES
Category 1 0.119 0.021 5.612
Category 2 0.515 0.029 17.997
Category 3 0.366 0.029 12.699
#6. This takes the thresholds from section 5 of the output and converts them into probabilities. So, if you are in class 1, your probability of being low SES (category 1) is .365, but if you are in class 2, your probability of being low SES (category 1) is .119.
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile ses read write math sci ss cprob1 cprob2 class using lca_ex4.txt
Below we show observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability. Note that we don’t see any folks in class 3 here, but there are members of class 3.
. list in 200/210
+-------------------------------------------------------------------+
| ses read write math sci ss cprob1 cprob2 class |
|-------------------------------------------------------------------|
200. | 2 46.9 52.1 42.5 47.7 60.5 .886 .114 1 |
201. | 0 46.9 51.5 57 49.8 40.6 .963 .037 1 |
202. | 2 46.9 52.8 49.3 53.1 35.6 .958 .042 1 |
203. | 1 46.9 43.7 41.9 41.7 35.6 1 0 1 |
204. | 0 46.9 61.9 53 52.6 60.5 .05 .95 2 |
|-------------------------------------------------------------------|
205. | 1 46.9 41.1 45.3 47.1 55.6 .998 .002 1 |
206. | 0 46.9 38.5 47.1 41.7 25.7 1 0 1 |
207. | 0 46.9 54.1 46.4 49.8 55.6 .938 .062 1 |
208. | 1 46.9 51.5 48.5 49.8 50.6 .93 .07 1 |
209. | 2 46.9 41.1 53.6 41.7 55.6 .989 .011 1 |
|-------------------------------------------------------------------|
210. | 0 46.9 61.9 46.2 60.7 45.6 .381 .619 2 |
+-------------------------------------------------------------------+
Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3. Note the correspondence between these means and the means from section 4 of the output.
. tabstat read write math sci ss [aw=cprob1], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 43.8374 45.06437 44.80045 44.47711 45.66875
------------------------------------------------------------
. tabstat read write math sci ss [aw=cprob2], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 58.7205 58.57446 57.80871 57.92401 57.4375
------------------------------------------------------------
Likewise, consider these ologit commands predicting ses using an empty model, but weighting the cases according to their class membership (class 1 then class 2). Note the correspondence between the cut points below and the thresholds in section 5 of the output.
. ologit ses [aw=cprob1]
------------------------------------------------------------------------------
ses | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
_cut1 | -.5526053 .1013389 (Ancillary parameters)
_cut2 | 1.683901 .1342721
------------------------------------------------------------------------------
. ologit ses [aw=cprob2]
------------------------------------------------------------------------------
ses | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
_cut1 | -2.004701 .1460688 (Ancillary parameters)
_cut2 | .5498481 .0980846
------------------------------------------------------------------------------
