It is common to fit a model where a variable (or variables) has an effect on the expected mean. You may also want to fit a model where a variable has an effect on the variance, that is a model with heteroskedastic errors. One example is a hierarchical model where the error variance is not constant over some categorical variable (e.g., gender). Another example is a longitudinal model where you might want to allow the errors to vary across time. This page describes how to fit such a model and demonstrates with an example.
Example 1: Heteroskedastic errors by group
It may be useful to note that a multi-level model (or a single level model) that allows different error terms between groups is similar to a multi-group structural equation model in which all parameters in the model are constrained to equality except for the error terms.
The dataset for this example includes data on 7185 students in 160 school. This dataset and model come from the HLM manual. The variable mathach is a measure of each student’s math achievement, the variable female is a binary variable equal to one if the student is female and zero otherwise, and the variable id contains the school identifier.
Below we run a model that uses female to predict mathach. This model assumes equal error variances for males and females.
use https://stats.idre.ucla.edu/stat/stata/faq/hsb, clear mixed mathach female || id:, var mle nolog Mixed-effects ML regression Number of obs = 7185 Group variable: id Number of groups = 160 Obs per group: min = 14 avg = 44.9 max = 67 Wald chi2(1) = 62.89 Log likelihood = -23526.66 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mathach | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -1.35939 .1714111 -7.93 0.000 -1.69535 -1.02343 _cons | 13.34526 .253935 52.55 0.000 12.84756 13.84297 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | 8.108982 1.018274 6.339834 10.37181 -----------------------------+------------------------------------------------ var(Residual) | 38.84481 .6555316 37.58101 40.15112 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 936.66 Prob >= chibar2 = 0.0000
In order to fit a model with heteroskedastic errors, we must modify the above model. The model shown above can be written as:
mathach_ij = b0 + b1*female_ij + u_i + e_ij (1)
This model assumes:
e_ij ~ N(0,s2) u_i ~ N(0,t2)
Where e_ij is the level 1 errors (i.e., residuals), and u_i is the random intercept across classrooms (i.e., the level 2 variance). We want to allow the variance of e_ij (i.e., s2) to be different for male and female students. One way to think about this would be to replace the single e_ij term, with separate terms for males and females, in equation form:
e_ij = e(m)_ij*male + e(f)_ij*female (2)
Where male is a dummy variable equal to one if the student is male and equal to zero otherwise, and e(m)_ij is the error term for males. The variable female and the term e(f)_ij are the same except that they are for females. Another way to write this is:
e_ij ~ N(0,s2_1) males N(0,s2_2) females
Another way to think about this is to use one group as the reference group, and estimate the difference between the variance of the two groups. In equation form this would be:
e_ij = e(m)_ij + e(f)_ij*female (3)
Where e(m)_ij is the error term for males, and e(f)_ij is the difference between the errors for males and females. This can also be written:
e_ij ~ N(0,s2 + s2_2*female)
This second way of looking at the error structure is part of how we will restructure our model to allow for heteroskedastic error terms.
Equation 3 says that the residual variance is a function of gender. So we can write it as r_ijk, where i is the group, j is the subject, and k is the gender.
mathach_ij = b0 + b1*female + u_i + r_ijk (4)
To model this, we add a level to our model. Now the third level will be classrooms (previously level 2), the second level will be students (previously level 1), and level 1 will be a single case within each student. The only random effect at level 1 is gender (even the intercept is fixed). The new model can be written as:
mathach_ij = b0 + b1*female_ij + u_i + e_ij*female + r_ij0 r_ijk = r_ij0 male r_ij1 female=e_ij+r_ij0
Where the level one error is represented by r_ij0. Because males are the omitted category in the random portion of the model, the variance of r_ij0 is the variance of the errors males (i.e., female=0). This is similar to the interpretation of the intercept in the fixed portion of the mode, where b0 represents the intercept for males, since they are the omitted category. The difference between the error terms for males and females is e_ij so the error term for females is r_ij + e1_ij.
A final consideration before we can actually fit the model is that Stata restricts the estimates of the error variances to be greater than zero. As a result, we must use the group with the smallest variance as the omitted category, since this will result in the values of the error variances for the group(s) being positive. To establish which group has the lowest variance you can examine the residuals from a model without heteroskedastic errors. In this example, examination of the residuals shows that the residual variance is smaller for females than males.
The code below creates two new variables, male, and nid. The variable nid will be the identifier for the students.
recode female (0=1) (1=0), gen(male) gen nid = _n
The resulting data set should look something like the one shown below. Where each classroom (identified by the variable id) has multiple rows in the dataset, but each case (i.e., each student, identified by the variable nid) has only one row.
id nid 1224 1 1224 2 1224 3 ... 1288 48 1288 49 1288 50 ... 1296 73 1296 74 1296 75
Once the necessary variables are created, we can run the model as shown below, which allows for a difference in the variance of the errors for males and females. It is necessary to specify the nocons option suppresses the random intercept at level 2, so that the only random effect at level 2 is gender (i.e., male).
mixed mathach female || id: || nid: male, nocons var mle nolog Mixed-effects ML regression Number of obs = 7185 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ id | 160 14 44.9 67 nid | 7185 1 1.0 1 ----------------------------------------------------------- Wald chi2(1) = 63.03 Log likelihood = -23522.932 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ mathach | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -1.363969 .1718025 -7.94 0.000 -1.700695 -1.027242 _cons | 13.34707 .2548619 52.37 0.000 12.84755 13.84659 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | 8.090527 1.016457 6.324639 10.34946 -----------------------------+------------------------------------------------ nid: Identity | var(male) | 3.622413 1.333432 1.76062 7.452988 -----------------------------+------------------------------------------------ var(Residual) | 37.13827 .8657943 35.47953 38.87456 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(2) = 944.12 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.
The second table in the output shows the estimated random effects. The residual variance for females is equal to var(Residuals) = 37.138, while the variance for males is var(Residuals) + var(male) = 37.1383 + 3.622 = 40.7607. Since the 95% confidence interval for var(male) does not include zero, we can say that the difference between the variances is statistically significant at the p<0.05 level.
Traditionally, and in some other packages (e.g., HLM), the variances in models with heteroskedastic errors have been parameterized as:
s2_ij = exp(a_0 + a_1*x_ij)
Where s2_ij is the variance of the residuals, x_ij is a dichotomous variable equal to one for one group and zero for the other, and a_0 and a_1 are estimated parameters for the variance. a_0 is the variance in the group where x_ij = 0, and a_1 is the difference in the variances between the two groups. If you want to compare the results of your analysis in Stata to those from another package, it is possible to translate between the two. As a first step, let’s think about what Stata is estimating a little more. One way to write what Stata models is:
s2_ij = var(Residuals) + var(x_ij)*x_ij
Where var(Residuals) is the variance of the level 1 errors, and var(x_ij) is the random effect of the dummy variable x_ij. Because Stata models the natural log of the standard deviation of the error term, the above is visually clear, but not quite correct. The actual model is:
s2_ij = exp(2*lnsd_0 ) + exp(2*lnsd_1)*x_ij
Where lnsd_0 is the natural log of the standard deviation of the level 1 errors, and lnsd_1 is the natural log of the standard deviation of the level 2 random effect. We can convert the estimates Stata gives us to a_0 and a_1 using the formulas below (each formula is written twice, the first for visual clarity, the second to reflect what is actually modeled by Stata):
a_0 = ln(var(Residuals)) a_0 = ln(exp(2*lnsd_0)) a_1 = ln(var(Residuals)+var(x_ij)) - ln(var(Residuals)) a_1 = ln[exp(2*lnsd_0) + exp(2*lnsd_1)] - ln[exp(2*lnsd_0)]
We can use the display command to calculate these values. Stata stores the ln(sd) for the level 1 residuals as [lnsig_e]_cons, and the ln(sd) of the random effect of male in [lns2_1_1]_cons. Below we use these values to calculate a_0 and a_1.
di "a_0=" ln(exp(2 * [lnsig_e]_cons)) a_0=3.6147746 di "a_1=" ln(exp(2 * [lnsig_e]_cons)+exp(2 * [lns2_1_1]_cons)) - ln(exp(2 * [lnsig_e]_cons)) a_1=.09308814
Example 2: A growth curve model
When you run a growth curve model in the structural equation modeling framework, it is not uncommon to allow the error variance to be different across time points, or at least to test whether such differences exist. A longitudinal model that allows different error variances across time points is similar to a growth model in the structural equation modeling setting, where all parameters except for the error terms across time points are constrained to equality. Below we show how to allow for different error variances across time points, and how to test whether the error variances are significantly different from each other.
The example dataset contains data on 239 subjects. The dataset includes observations at up to five time points per subject, for a total of 1079 observations. The variable id uniquely identifies subjects. The time at which an observation was taken is indicated by the variable time, which takes on the values 0 to 4.
Below we use mixed to fit a model where the variables x and time predict the variable attit. This model assumes the variance is the same across time points.
use https://stats.idre.ucla.edu/stat/stata/faq/nys2, clear mixed attit x time ||id:, var mle nolog Mixed-effects ML regression Number of obs = 1079 Group variable: id Number of groups = 239 Obs per group: min = 1 avg = 4.5 max = 5 Wald chi2(2) = 324.65 Log likelihood = 140.31043 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ attit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0241245 .0033107 7.29 0.000 .0176357 .0306134 time | .0600884 .0039557 15.19 0.000 .0523353 .0678415 _cons | .1191882 .0184893 6.45 0.000 .0829498 .1554267 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | .0308536 .003533 .0246511 .0386167 -----------------------------+------------------------------------------------ var(Residual) | .0311131 .0015204 .0282714 .0342405 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 324.66 Prob >= chibar2 = 0.0000
The model above can be written as:
attit_it = b0 + b1*x_it + b2*time_it + u_i + e_it (4)
It is assumed that:
e_it ~ N(0,s2) u_i ~ N(0,t2)
Where u_i is the random intercept across individuals (i.e., the level 2 variance), and e_it represents the level 1 errors (i.e., residuals). We want to allow the variance of e_it to be different at each time point. One way to think about this would be to replace the single e_ij term, with separate terms for each time point, in equation form:
e_it = e_i0*t0 + e_i1*t1 + e_i2*t2 + e_i3*t3 + e_i4*t4
Where t0 is a dummy variable equal to one if time = 0 (i.e. the first time point) and equal to zero otherwise, and e_i0 is the error for the first measurement occasion (time=0). The variables t1 to t4, are dummy variables for time points 1 to 4. The terms e_i1 to e_i4 are the error terms for time points 1 to 4.
In order to model the heteroskedastic errors, we add a third level to our model. In this new model, the third level will be individuals (previously level 2), the second level will be time points (previously level 1), and level 1 will be a single case within each time point. Since the effect of time is in the level at model 2, only random effects for time are included at level 1. The new model can be written as:
attit_itk = b0 + b1*x_it + b2*time_it + e_i1*t1 + e_i2*t2 + e_i3*t3 + e_i4*t4 + r_ij0
Where the level one error is represented by r_it0. Because the time=0 is the omitted category, the variance of r_it0 is the variance of the errors at time=0. The random effects e_i1 to e_i4 represent the difference between the variance of the errors and the variance of the errors at time=0 for each time point.
A final consideration before we can actually fit the model is that Stata restricts the estimates of the error variances to be greater than zero. As a result, we must use the time point (or other category/group) with the smallest variance as the omitted category, since this will result in the values of the error variances for the other time points being positive. To establish which time point or group has the lowest variance you can examine the residuals from a model without heteroskedastic errors. In this example, the residual variance is smallest when time=0.
In order to fit the model we will need dummy variables for each of the other time points.
gen t1 = (time==1) gen t2 = (time==2) gen t3 = (time==3) gen t4 = (time==4)
Now we are ready to fit our model using mixed. The nocons option suppresses the random intercept at level 2.
mixed attit x time ||id: ||time: t1 t2 t3 t4, nocons var mle nolog Mixed-effects ML regression Number of obs = 1079 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ id | 239 1 4.5 5 time | 1079 1 1.0 1 ----------------------------------------------------------- Wald chi2(2) = 319.37 Log likelihood = 143.67787 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ attit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0238029 .0033059 7.20 0.000 .0173235 .0302823 time | .0597567 .0039638 15.08 0.000 .0519878 .0675256 _cons | .1208 .0178047 6.78 0.000 .0859035 .1556965 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | .0280981 .0034551 .0220805 .0357557 -----------------------------+------------------------------------------------ time: Independent | var(t1) | .0027798 .0046901 .0001018 .0758888 var(t2) | .0058393 .0053178 .0009799 .0347972 var(t3) | .0125841 .0060606 .0048964 .0323421 var(t4) | .01354 .0059784 .0056988 .0321702 -----------------------------+------------------------------------------------ var(Residual) | .0247609 .0033388 .0190103 .032251 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(5) = 331.40 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.
Once we have run the model, we may want to test whether the estimated variances are different either from zero or different from each other. Below nlcom is used to get the estimated variance at each time point.
nlcom (time0: exp(2 * [lnsig_e]_cons)) /// (time1: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_1]_cons)) /// (time2: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_2]_cons)) /// (time3: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_3]_cons)) /// (time4: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_4]_cons)) time0: exp(2 * [lnsig_e]_cons) time1: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_1]_cons) time2: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_2]_cons) time3: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_3]_cons) time4: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_4]_cons) ------------------------------------------------------------------------------ attit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- time0 | .0247609 .0033388 7.42 0.000 .018217 .0313048 time1 | .0275406 .0034705 7.94 0.000 .0207385 .0343427 time2 | .0306002 .0035976 8.51 0.000 .0235491 .0376513 time3 | .037345 .0044345 8.42 0.000 .0286536 .0460364 time4 | .0383009 .0044824 8.54 0.000 .0295155 .0470862 ------------------------------------------------------------------------------
We can also use nlcom to estimate the difference in the variance between time points, in this case, the difference between the variance at time = 1 and time = 4.
nlcom (t4_t1: exp(2 * [lns2_1_4]_cons) - exp(2 * [lns2_1_1]_cons)) t4_t1: exp(2 * [lns2_1_4]_cons) - exp(2 * [lns2_1_1]_cons) ------------------------------------------------------------------------------ attit | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- t4_t1 | .0107602 .0060541 1.78 0.076 -.0011055 .0226259 ------------------------------------------------------------------------------
References
Gutierrez, Roberto G. (2008) Tricks of the Trade: Getting the most out of xtmixed. 2008 Fall North American Stata Users Group Meeting, San Francisco, CA.
Raudenbush, Stephen, Anthony Bryk, Yuk Fai Cheong, and Richard Congdon (2001) HLM 5: Hierarchical Linear and Nonlinear Modeling. Scientific Software International. Lincolnwood, IL.