It is common to fit a model where a variable (or variables) has an effect on the expected mean. You may also want to fit a model where a variable has an effect on the variance, that is a model with heteroskedastic errors. One example is a hierarchical model where the error variance is not constant over some categorical variable (e.g. gender). Another example is a longitudinal model where you might want to allow the errors to vary across time. This page describes how to fit such a model and demonstrates with an example.

## Example 1: Heteroskedastic errors by group

It may be useful to note that a multi-level model (or a single level model) that allows different error terms between groups is similar to a multi-group structural equation model in which all parameters in the model are constrained to equality except for the error terms.

The dataset for this example, hsb, includes data on 7185 students in
160 classrooms. The variable **mathach** is a measure of each student’s math
achievement, the variable **female** is a binary variable equal to one if the
student is female and zero otherwise, and the variable **id** contains the
classroom identifier.

Below we run a model that uses **female** to predict **mathach**. This
model assumes equal error variances for males and females.

proc mixed data = hsb method = ml; class id; model mathach = female / solution; random intercept/ subject = id type = cs; run;Covariance Parameter Estimates Cov Parm Subject Estimate Variance id 7.7108 CS id 0.3982 Residual 38.8448 Fit Statistics -2 Log Likelihood 47053.3 AIC (smaller is better) 47063.3 AICC (smaller is better) 47063.3 BIC (smaller is better) 47078.7 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 2 936.66 <.0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 13.3453 0.2539 159 52.55 <.0001 female -1.3594 0.1714 7024 -7.93 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F female 1 7024 62.89 <.0001

In order to fit a model with heteroskedastic errors, we must modify the above model. The model shown above can be written as:

mathach_ij = b0 + b1*female_ij + u_i + e_ij (1)

This model assumes:

e_ij ~ N(0,s2) u_i ~ N(0,t2)

Where **e_ij** is the level 1 errors (i.e. residuals)**, **and** u_i**
is the random intercept across classrooms (i.e. the level 2 variance). We want
to allow the variance of **e_ij** (i.e. **s2**) to be different for male
and female students. One way to think about this would be to replace the single
**e_ij** term, with separate terms for males and females, in equation form:

e_ij = e(m)_ij*male + e(f)_ij*female (2)

Where **male** is a dummy variable equal to one if the student is male and
equal to zero otherwise, and **e(m)_ij** is the error term for males. The
variable **female** and the term **e(f)_ij** are the same except that they
are for females. Another way to write this is:

e_ij ~ N(0,s2_1) females N(0,s2_2) males

Another way to think about this is to use one group as the reference group, and estimate the difference between the variance of the two groups. In equation form this would be:

e_ij = e(f)_ij + e(m)_ij*male (3)

Where **e(f)_ij** is the error term for males, and **e(m)_ij** is the
difference between the errors for males and females. This can also be written:

e_ij ~ N(0,s2 + s2_2*male)

This second way of looking at the error structure is part of how we will restructure our model to allow for heteroskedastic error terms.

Equation 3 says that the residual variance is a function of gender. So we can write it as r_ijk, where i is the group, j is the subject, and k is the gender.

mathach_ij = b0 + b1*female + u_i + r_ijk (4)

We can estimate the residual variance as a function of gender by using the **
repeated** statement instead of **random** and using the **local** option in the **
repeated** statement. This option adds a diagonal matrix to the
variance-covariance matrix estimated in the model. We wish to estimate a
variance parameter for gender. We will specify this effect as an exponential
local effects and then go through the necessary transformations later to extract
the estimated variance for each level of gender.

proc mixed data=hsb order = data method=ml; class female id; model mathach=female / solution ; repeated female /subject = id type = cs r local=exp(male); run;Covariance Parameter Estimates Cov Parm Subject Estimate Variance id 0.2929 CS id 7.8810 EXP male 0.09229 Residual 37.0531 Fit Statistics -2 Log Likelihood 47044.4 AIC (smaller is better) 47056.4 AICC (smaller is better) 47056.4 BIC (smaller is better) 47074.9 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 3 945.56 <.0001 Solution for Fixed Effects Standard Effect female Estimate Error DF t Value Pr > |t| Intercept 13.3471 0.2568 159 51.96 <.0001 female 1 -1.3759 0.1850 122 -7.44 <.0001 female 0 0 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F female 1 122 55.32 <.0001

The first table in the output shows the covariance parameter estimates. The
residual variance for females is equal to **Residual** = 37.0531, while the
variance for males is **Residual***exp^(**EXP male**) =
37.0531*exp(0.09229) = 40.635498.

## Example 2: A growth curve model

When you run a growth curve model in the structural equation modeling framework, it is not uncommon to allow the error variance to be different across time points, or at least to test whether such differences exist. A longitudinal model that allows different error variances across time points is similar to a growth model in the structural equation modeling setting, where all parameters except for the error terms across time points are constrained to equality. Below we show how to allow for different error variances across time points.

The example dataset contains data on 239 subjects. The dataset includes
observations at up to five time points per subject, for a total of 1079
observations. The variable **id** uniquely identifies subjects. The time at
which an observation was taken is indicated by the variable **time**, which
takes on the values 0 to 4.

The example dataset, nys1, contains 1079 observations taken at up to five time
points, nested within 239 subject. Although the example we use here is a
longitudinal model, the same technique could be used for other types of
clustered data, for example, students nested in classrooms. Below we use **proc
mixed** to fit a model where the variables **x**, and **time** predict
the variable **attit**. The variable **time** takes on five values, 0-4.
The **vi** option in the **random** statement requests that variances of
the random effects be reported in the output. The we specify **method = ml**
indicating
that the maximum likelihood estimator be used. We specify **type = cs**,
indicating that we are fitting a model with compound symmetry in which the same
variance-covariance is assumed for all times.
** **

proc mixed data = nys1 method=ml; class id; model attit = x time /solution outpred = nys2; random intercept / subject=id type = cs vi; run;The Mixed Procedure Model Information Data Set WORK.NYS1 Dependent Variable attit Covariance Structure Compound Symmetry Subject Effect id Estimation Method ML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment < ...output omitted... > Covariance Parameter Estimates Cov Parm Subject Estimate Variance id 0.02896 CS id 0.001890 Residual 0.03111 Fit Statistics -2 Log Likelihood -280.6 AIC (smaller is better) -268.6 AICC (smaller is better) -268.5 BIC (smaller is better) -247.8 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 2 324.66 <.0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 0.1192 0.01849 238 6.45 <.0001 x 0.02412 0.003311 838 7.29 <.0001 time 0.06009 0.003956 838 15.19 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F x 1 838 53.10 <.0001 time 1 838 230.74 <.0001

The above model assumes that the residual errors are the same across time
points, which may not be a reasonable assumption. In some cases, we may suspect
heteroskedastic errors. In other cases, we may find them during our analysis. We
can look at a boxplot of residuals at each time point to help
us evaluate whether this is a reasonable assumption. From the **proc mixed**
above, we have the residuals in the output dataset **nys2**. Below is the
code to generate the boxplot.

proc sort data = nys2; by time; run; proc boxplot data = nys2; plot Resid * time; run;

The graphs above show some differences in the residuals across time points, suggesting that we may want to model these differences. It is easier to understand what is going on if we look at the equations for our model. Our original model can be written as:

attit_ij = b0 + b1*x_ij + b2*time_ij + u_i + e_ij

Where u_i is the variance in the intercept across individuals (i.e. the level 2 variance), e_ij is the level 1 error variance. We want to allow e_ij to be different at each time point, in other words, we want to replace the single e_ij, with separate terms for each time point, in equation form:

e_ij = e(t0)_ij*time0 + e(t1)_ij*time1 + e(t2)_ij*time2 + e(t3)_ij*time3 + e(t4)_ij*time4

Where time0 is a dummy variable equal to one if **time** = 0 (i.e. the
first time point) and equal to zero otherwise, and e(t0)_ij is the error for
time 1. In other words, var(e_ij|time=0) = var(e(t0)_ij). The variables time1 to
time4, are dummy variables for time equal one through four. The terms e(t1)_ij
to e(t4)_ij are the error terms for time points 1-4.

Using some of the options in **proc mixed**, this model is doable in SAS, though
the logic and interpretation require some careful thought. In order to model the heteroskedastic
errors, we will be estimating the entries of a
diagonal matrix to be added to the compound symmetry variance-covariance matrix.
Each entry will correspond to a different time point. The "new" model can be written as:

attit_ijk = b0 + b1*x_ij + b2*time_ij + (b3 + e3_ij)*t1 + (b4 + e4_ij)*t2 + (b5 + e5_ij)*t3 + (b6 + e6_ij)*t4 + r_ijk

In the model above, t1 to t4 are dummy variables for the values of time, for
example, when t1 = 1, when time=1, and zero otherwise. You may find it odd that
the model includes both continuous time and a series of dummy variables
representing time. The effect of the variable **time** is fixed, and it's
coefficient, b2, is freely estimated. The model we will fit using **proc mixed**
will use **repeated** instead of **random**. We can replicate the
model we ran above by creating a categorical variable for time to be used in the
**repeated** statement.

data nys1; set nys1; timecat = time; run; proc mixed data = nys1 method=ml; class id timecat; model attit = x time /solution; repeated timecat / subject=id type=cs; run;The Mixed Procedure < ...output omitted... > Covariance Parameter Estimates Cov Parm Subject Estimate CS id 0.03085 Residual 0.03111 Fit Statistics -2 Log Likelihood -280.6 AIC (smaller is better) -270.6 AICC (smaller is better) -270.6 BIC (smaller is better) -253.2 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 1 324.66 <.0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 0.1192 0.01849 238 6.45 <.0001 x 0.02412 0.003311 838 7.29 <.0001 time 0.06009 0.003956 838 15.19 <.0001

To this model, we will add exercise the **local** option in the **
repeated** statement. This option adds a diagonal matrix to the
variance-covariance matrix estimated in the model. We wish to estimate a
variance parameter for each time point. To do this, we will first create
indicators for four of the five levels of our time variable. We will then
enter these into the **local** statement. We will specify these as
exponential local effects and then go through the necessary transformations
later to extract the estimated variance for each level of time.

data nys1; set nys1; if time = 1 then ind2 = 1; else ind2 = 0; if time = 2 then ind3 = 1; else ind3 = 0; if time = 3 then ind4 = 1; else ind4 = 0; if time = 4 then ind5 = 1; else ind5 = 0; run; ods output CovParms =t; proc mixed data = nys1 method=ml; class id timecat; model attit = x time /solution; repeated timecat /subject=id type=cs r local=exp(ind2 ind3 ind4 ind5); run;The Mixed Procedure < ...output omitted... > Estimated R Matrix for id 1006 Row Col1 Col2 Col3 Col4 Col5 1 0.05286 0.02810 0.02810 0.02810 0.02810 2 0.02810 0.05564 0.02810 0.02810 0.02810 3 0.02810 0.02810 0.05870 0.02810 0.02810 4 0.02810 0.02810 0.02810 0.06544 0.02810 5 0.02810 0.02810 0.02810 0.02810 0.06640 Covariance Parameter Estimates Cov Parm Subject Estimate Variance id 0 CS id 0.02810 EXP ind2 0.1064 EXP ind3 0.2117 EXP ind4 0.4109 EXP ind5 0.4362 Residual 0.02476 Fit Statistics -2 Log Likelihood -287.4 AIC (smaller is better) -269.4 AICC (smaller is better) -269.2 BIC (smaller is better) -238.1 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 5 331.40 <.0001 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 0.1208 0.01780 238 6.78 <.0001 x 0.02380 0.003306 838 7.20 <.0001 time 0.05976 0.003964 838 15.08 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F x 1 838 51.84 <.0001 time 1 838 227.27 <.0001

We have saved our covariance parameter estimates in a dataset **t**. We
can look at the values here. The **CS** (compound symmetry) value we
see is the off-diagonal entry of the R matrix in the model. The **
Residual** estimate corresponds to the baseline error variance at **time**
= 0. The other values presented are the exponential local effects. In the
SAS code below, we calculate the error variances for the five time points.

proc print data = t; run;Obs CovParm Subject Estimate 1 Variance id 0 2 CS id 0.02810 3 EXP ind2 0.1064 4 EXP ind3 0.2117 5 EXP ind4 0.4109 6 EXP ind5 0.4362 7 Residual 0.02476* sigma^2*exp(xb); proc transpose data = t (drop=subject) prefix = _ out=twide; id covparm; run; proc print data = twide; run;_EXP_ _EXP_ _EXP_ _EXP_ Obs _NAME_ _Variance _CS ind2 ind3 ind4 ind5 _Residual 1 Estimate 0 0.02810 0.1064 0.2117 0.4109 0.4362 0.02476data twidea; set twide; array sigma(5); array b(4) _exp_ind2 - _exp_ind5; do i = 2 to 5; sigma(i) = _residual*exp(b[i-1]); end; sigma1 = _residual; run; proc print data = twidea; var sigma:; run;Obs sigma1 sigma2 sigma3 sigma4 sigma5 1 0.024761 0.027541 0.030600 0.037345 0.038301