The following example shows the output in Mplus, as well as how to reproduce it using Stata. For this example we will use the same dataset we used for our logit regression data analysis example. You can download the dataset for Mplus here: logit.dat. The model we specify for this example includes four variables, three predictors and one outcome. We use Graduate Record Exam scores (gre), undergraduate grade point average (gpa), and prestige of the undergraduate program (topnotch) to predict that whether an applicant is admitted to graduate school. The Mplus input for this model is:
data: file is logit.dat; variable: names are admit gre topnotch gpa; categorical = admit; analysis: type = general; estimator = ml; ! need to use estimator = ml to make this a logistic model; model: admit on gre topnotch gpa; output: stand;
Below are the results from the model described above. Note that Mplus produces two types of standardized coefficients “Std” which are in the fifth column of the output shown below, and “StdXY” which are in the sixth column. The Std column contains coefficients standardized using the variance of continuous latent variables. Because all of the variables in this model are manifest (i.e. observed) the coefficients in this column are identical to those in the column of regular coefficients (i.e. the “Estimates” column). The StdXY column contains the coefficients standardized using the variance of the background and/or outcome variables, in addition to the variance of continuous latent variables.
MODEL RESULTS Estimates S.E. Est./S.E. Std StdYX ADMIT ON GRE 0.002 0.001 2.314 0.002 0.152 TOPNOTCH 0.437 0.292 1.498 0.437 0.086 GPA 0.668 0.325 2.052 0.668 0.135 Thresholds ADMIT$1 4.601 1.096 4.196 4.601 2.439
Now, from the latent variable point of view, there is a latent variable behind the observed dichotomous variable and this latent variable is the true outcome variable. In other word, the logistic regression is simply modeling the latent variable using the linear relationship:
$$ y^{*} = \beta_0 + \beta_1* GRE + \beta_2*TOPNOTCH + \beta_3*GPA $$
Notice that there is no random residual term here. Instead, we assume that
$$ y^{*} – (\beta_0 + \beta_1* GRE + \beta_2*TOPNOTCH + \beta_3*GPA) $$ obeys the standard logistic distribution. Therefore, the variance of \(y^{*}\) is the sum of variance of the linear prediction plus the variance of standard logistic distribution, which is \(\frac{\pi^2}{3}\), that is \(Var(y^{*}) = Var(X\beta) +\frac{\pi^2}{3}\). This is the formula that Mplus uses to calculate the variance for the outcome variable.
Now we are ready to replicate the results from Mplus in Stata. The first bold line below opens the dataset, and the second runs the logistic regression model in Stata. Note that the raw coefficients from Stata and Mplus are within rounding error of each other, this should be the case, since we are running the same model. We have also run fitstat to display many fit indices including the variance for \(y^{*}\).
use https://stats.idre.ucla.edu/stat/stata/dae/logit.dta, clear logit admit gre topnotch gpa, nolog Logistic regression Number of obs = 400 LR chi2(3) = 21.85 Prob > chi2 = 0.0001 Log likelihood = -239.06481 Pseudo R2 = 0.0437 ------------------------------------------------------------------------------ admit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gre | .0024768 .0010702 2.31 0.021 .0003792 .0045744 topnotch | .4372236 .2918532 1.50 0.134 -.1347983 1.009245 gpa | .6675556 .3252593 2.05 0.040 .0300592 1.305052 _cons | -4.600814 1.096379 -4.20 0.000 -6.749678 -2.451949 ------------------------------------------------------------------------------fitstat Measures of Fit for logit of admit Log-Lik Intercept Only: -249.988 Log-Lik Full Model: -239.065 D(396): 478.130 LR(3): 21.847 Prob > LR: 0.000 McFadden's R2: 0.044 McFadden's Adj R2: 0.028 ML (Cox-Snell) R2: 0.053 Cragg-Uhler(Nagelkerke) R2: 0.074 McKelvey & Zavoina's R2: 0.075 Efron's R2: 0.052 Variance of y*: 3.558 Variance of error: 3.290 Count R2: 0.683 Adj Count R2: 0.000 AIC: 1.215 AIC*n: 486.130 BIC: -1894.490 BIC': -3.873 BIC used by Stata: 502.095 AIC used by Stata: 486.130
How does fitstat compute the variance of \(y^{*}\)? We have explained earlier that \(Var(y^{*}) = Var(X\beta) +\frac{\pi^2}{3}\) and now let’s check if this is the case.
predict xb, xb sum xbVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- xb | 400 -.8111861 .5180669 -2.166729 .4880949return listscalars: r(N) = 400 r(sum_w) = 400 r(mean) = -.8111860970774433 r(Var) = .2683933174379701 r(sd) = .5180669044032538 r(min) = -2.166728973388672 r(max) = .4880948960781097 r(sum) = -324.4744388309773display r(Var) + (_pi^2)/3 3.5582615
As you can see, they match very nicely. Now we are ready to calculate a standardized coefficient. This is also called “full-standardization” since it requires both the outcome variable and the predictor variable to be standardized. As always, we will need three pieces of information, the standard deviation of \(y^{*}\), the standard deviation of the predictor variable for which we want to create a standardized coefficient, and the raw coefficient for that predictor variable.
To obtain the standard deviation for the linear predictor, we will create a local macro variable based on what have calculated above, this is the first line of code below. Next we summarize the predictor variable for which we want to create a standardized coefficient, in this case gre, and save the standard deviation to a local macro variable called “xstd.” Since Stata automatically stores the coefficients from the last regression we ran, we can access the coefficient for gre by typing _b[gre]. Now we are ready to actually calculate the standardized coefficients. The second to last command below creates a new local macro called “gre_std” and sets it equal to the standardized coefficient for gre (i.e. _b[gre]*`xstd’/`ystd’). The last command shown below tells Stata to display the contents of “gre_std” which is the standardized coefficient for the relationship between gre and the log odds of y. This value is approximately 0.1516, looking at the Mplus output above, we see that the standardized coefficient (StdYX) for male is also estimated to be 0.152 by Mplus.
local ystd=sqrt(r(Var)+(_pi^2)/3) sum gre Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gre | 400 587.7 115.5165 220 800 local xstd = r(sd) local gre_std = _b[gre]*`xstd'/`ystd' display "`gre_std'" .1516774659729085
The commands and output below show the same process for the other two predictor variables in the model.
sum topnotch Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- topnotch | 400 .1625 .3693709 0 1 local xstd = r(sd) local topnotch_std = _b[topnotch]*`xstd'/`ystd' display "`topnotch_std'" .0856144885799177 sum gpa Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gpa | 400 3.3899 .3805668 2.26 4 local xstd = r(sd) local gpa_std = _b[gpa]*`xstd'/`ystd' display "`gpa_std'" .1346788501438455
Cautions, Flies in the Ointment
- Because the variance of the linear prediction (xb) is used, it is very much model-based. In other words, your standardized coefficients will be heavily influenced by your model, not just through regression coefficients themselves (which are always based on the model) but through the standardization process as well. This makes the interpretation of these standardized coefficients not as straightforward as standardized coefficients from a linear regression.
See Also
- Mplus User’s Guide online (See page 503 of the Version 4.1 User’s Guide.)