In order to help show the relationships among an OLS, random intercept, and random slope models this page shows a series of models each of which builds on the previous models. Model 1 is a standard ordinary least squares (OLS) regression model. Model 2 adds a random intercept to more appropriately model clustered (multi-level) data. Model 3 adds a random slope. Model 4 adds the covariance between the random intercept and slope. Finally, model 5 includes a cross level interaction.

We use proc nlmixed to fit all of these models because it will not only fit all of these
models, but the syntax structure and progression across the models allows us to clearly demonstrate
the differences, and similarities, between these models. SAS** **proc nlmixed
is a highly flexible procedure that can be used to run a large variety of
models. We do not, however, intend to suggest that you should run these models using nlmixed. In many
cases it would be easier to run the first model in proc reg, and the subsequent
models in proc mixed. An additional advantage to using mixed over nlmixed is
that mixed allows the use of both maximum likelihood estimation and restricted
maximum likelihood estimation, nlmixed only uses maximum likelihood estimation.

The dataset for this example includes data on 7185 students in 160
schools. It comes from the HLM manual. The outcome variable **mathach** is a measure of each student’s math
achievement, the predictor variable **female** is a binary variable equal to one if the
student is female and zero otherwise, and the predictor variable **pracad** is the proportion of students at that school who are on the academic
track. The variable **id** is the school identifier. Note that **female**
varies within school (often called a level 1 variable) while **pracad**
is constant within a school, but varies across schools (often called a level
2 variable). You can download the data used in this example by clicking
https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb-1.sas7bdat.

## Model 1: An OLS regression

The first model we will run is an ordinary least squares (OLS) regression model where **female** and **pracad** predict
**mathach**. In equation form the model is:

mathach = b0 + b1*female + b2*pracad + e

And we assume:

e ~ N(0,s2)

Below is the **proc nlmixed** syntax corresponding to this specification. Here we define
**pred** as a linear function of **female** and **pracad**, **mathach** is distributed (~)
normally with mean equal to **pred** and variance **s2**, both of which we wish to estimate (**pred** is estimated through the
coefficients, **s2** is estimated based on the model residuals).

proc nlmixed data="d:datahsb"; pred = b0 + b1*female + b2*pracad; model mathach ~ normal(pred,s2); run;

Towards the end of the output, we see the parameter estimates shown below.

Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.3439 0.2023 7185 46.20 <.0001 0.05 8.9474 9.7404 -0.1721 b1 -1.5037 0.1546 7185 -9.72 <.0001 0.05 -1.8068 -1.2006 -0.16423 b2 7.8527 0.3073 7185 25.55 <.0001 0.05 7.2503 8.4551 -0.08668 s2 42.7040 0.7124 7185 59.94 <.0001 0.05 41.3074 44.1006 -0.00432

Based on the above estimates, the model is:

mathach = 9.3439 - 1.5037*female + 7.8527*pracad + e

Where:

e ~ N(0,42.7)

This model is a standard OLS regression model, and the coefficients are
interpreted as usual for a regression model. Another way to describe this model
is to say that all of the coefficients (**b0**, **b1**, and **b2**) are fixed, only
the error term (**e**) has a variance (**s2**). This model has
totally ignored the nesting structure, assuming all observations are independent
of each other. Of course this will not be a valid model for this data, nevertheless, it gives
us a starting point.

## Model 2: A random intercept model

The problem with the OLS model is that it fails to account for the fact that the observations are not independent, that is, that students are nested within schools. There is often good reason to believe that observations within a cluster are more alike than observations across clusters. In our case, students at one school might, on average, have higher or lower math achievement scores, even after controlling for gender and the percent of students on an academic track. As a result, the residuals for students at that school would be systematically higher or lower than at other schools, violating one of the assumptions of OLS regression. To accommodate this, we can model the intercept as having a mean and a variance. This is the basic multi-level model. In equation form:

mathach = b0 + u + b1*female + b2*pracad + e

Where we assume:

e ~ N(0,s2) u ~ N(0,s2u)

With the addition of **u**, we can say that the intercept has two parts, a
fixed part, represented by **b0**, and a random component, represented by **u**. The
parameter **b0** is said to be fixed because it does not vary, while the **u**
is said to be random, because it is assumed to take on different values for different level 2
units (i.e. schools). Substantively, **b0** is the mean intercept across all
schools, while **u** allows for variation around that mean. Another way to think about this model is that there are two sources of error in our predictions, error that is a result of a school having a mean score that is higher or lower than other schools (**u**), and error that is a result of individual variation (**e**). In the OLS model above, we combine these into a single term,
**e**, while in a random intercept model we estimate both the variance of **e** and the variance of
**u**.
The model can also be written in the two level formulation (the models, and
the assumptions about **e** and **u,** are the same):

Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u b1 = g10

The code below fits the random intercept model. Before we can run a
random effects model with **nlmixed**, we need to sort the data by the grouping
variable, in this case by **id**, which identifies which school a student was
in. The
**nlmixed** command below differs from the **nlmixed** command for the OLS model in
two ways. First, the definition of **pred**, the predicted value of **
mathach**, now
includes the parameter **u**, the random coefficient
for the intercept. This means that expected mean of **mathach** now depends
on which school the student attends. The second change is the addition of the **random**
statement, which indicates that we wish to add a random effect to
our model. The random statement defines **u** as distributed normally with a mean of zero and
constant variance we wish to estimate, denoted **s2u** using the code **u ~ normal(0, s2u)**. The groups across
which we wish to estimate the random effect are identified by **subject=id**
(note that **id**
is the variable that identifies schools in this dataset).

proc sort data="d:datahsb"; by id; run; proc nlmixed data="d:datahsb"; pred = b0 + u + b1*female + b2*pracad; model mathach ~ normal(pred,s2); random u ~ normal(0,s2u) subject=id; run;Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1800 0.4084 159 22.48 <.0001 0.05 8.3734 9.9867 0.000236 b1 -1.3450 0.1691 159 -7.96 <.0001 0.05 -1.6789 -1.0111 0.000183 b2 8.0494 0.6866 159 11.72 <.0001 0.05 6.6934 9.4054 0.000201 s2 38.8414 0.6554 159 59.26 <.0001 0.05 37.5470 40.1358 0.000831 s2u 3.9315 0.5444 159 7.22 <.0001 0.05 2.8563 5.0066 -0.00056

Based on the above estimates, the model can be written:

mathach = 9.18 + u - 1.35*female + 8.05*pracad + e

Where:

e ~ N(0,38.84) u ~ N(0,3.93)

You may have noticed that the term **u** remains in the equation for our predicted values of **mathach**,
rather than being replaced with an estimate. This is because we
have not modeled the intercepts for each school individually (as one would in a
so called fixed effects model)
but instead we have modeled their distribution, so that the actual intercept (**b0**+**u**) for each school can be
estimated based on the model, but is not actually part of the model.

In this model, **b0**, the intercept, is interpreted as the average
(mean) intercept across all schools, while **s2u** is the estimated variance of the individual schools around that
mean. Another way to say this is that **s2u** is the variance between schools.
Larger values of **s2u** indicate greater differences in intercepts
across schools (keeping in mind the size of the intercept itself). We also might want to look at the standard deviation of **u**, that is, the square root of **s2u**,
rather than the variance (i.e. **s2u** itself) if we want to get a better sense of how
much the intercepts vary across schools. The standard deviation of **u** is
1.98 (=sqrt(3.93)). The coefficients for the independent variables, that is, **b1** and **b2**,
are interpreted in the same manner as before.
The estimates for **b1** and **b2** are slightly different between the two models. This
is because this model accounts for the relationship between the **mathach**
scores of students at the same school.

You may have noticed that the variance of **e** is smaller in this model
than in model 1(38.83 vs. 42.7).
This is because, as discussed above, in model 1, the error due to individual
variation was combined with the error due to variation across schools. In this model
(model 2), we include both sources of error in the model, allowing us to distinguish
between error at the individual and group levels. Based on the variance of **e**
and **u** in model 2, we can calculate the proportion of variance that is due
to differences across schools, often called the intra-class correlation (icc).
The formula for this is s2u/(s2u+s2e), in other words, the variance accounted
for by schools, over the total variance. For this model the icc is
3.9315/(38.8414+3.9315) = .0919, so we can say that about 9% of total variance
is accounted for by differences in the intercepts across schools.

## Model 3: A mixed model with independent covariance structure for the random effects

In addition to allowing the intercept of **mathach** vary by school, it is possible
to allow the relationship
between **mathach** and a predictor variable to vary by school, that is the
effect of the predictor variable might be weaker or stronger at different
schools. To allow for this we
estimate a model with a random coefficient for the variable whose effect we believe may
vary by school. Below we estimate a random coefficient for the variable **
female**. Note that it does not make sense to include a random coefficient for the
variable **pracad** in this model, since **pracad** does not vary within
school. The equation for the model looks like this:

mathach = b0 + u0 + b1*female + u1*female + b2*pracad + e = b0 + u0 + (b1+u1)*female + b2*pracad + e

Where we assume:

e ~ N(0,s2) u0 ~ N(0,s2u0) u1 ~ N(0,s2u1) cov(u0,u1)=0

The random coefficients are **u0** (for the intercept) and **u1** (for
**female**). In addition to modeling the variances themselves, we can model the relationship between **u0** and
**u1**. We can specify three different relationships
between **u0** and **u1**,
we can assume that they are unrelated, and fix
their covariance to zero; we can assume their covariance is equal to some a non-zero value
and fix the covariance to that value (this is rarely done); or we can estimate the covariance
**u0** and **u1** as part of the model. Note that with more than two
random effects, additional types of relationships can be modeled. In this model we will assume that **u0** and
**u1** have a
covariance of zero, this is sometimes called an
independent covariance structure for the random effects.

As before we can also write the model in its two-level form (the models, and
the assumptions about **e** and **u,** are the same):

Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u0 b1 = g10 + u1

The code for this model is different from the code for the random intercept model in
several ways. First, the equation for **pred** now includes the term **u1**.
Second, the random statement now includes two random effects (**u0** and **u1**). As in
the previous models, the random coefficients are assumed to be normally
distributed, their mean and variance are just specified a little differently in
the code. Instead of a single value, we now have the
vector of means [0,0] one for each of the random effects. Instead of a single parameter for variance we now have a variance-covariance matrix that is shown
as the vector [s2u0, 0, s2u1]. The vector [s2u0, 0, s2u1] is equivalent to the lower triangular
variance-covariance matrix:

s2u0 0 s2u1

proc nlmixed data="d:datahsb"; pred = b0 + u0 + (b1+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id; run;Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1840 0.4080 158 22.51 <.0001 0.05 8.3781 9.9899 0.00042 b1 -1.3494 0.1742 158 -7.75 <.0001 0.05 -1.6934 -1.0054 0.004663 b2 8.0484 0.6870 158 11.72 <.0001 0.05 6.6915 9.4053 -0.0018 s2 38.8006 0.6587 158 58.91 <.0001 0.05 37.4997 40.1016 0.002626 s2u0 3.8666 0.5564 158 6.95 <.0001 0.05 2.7676 4.9657 -0.00124 s2u1 0.2195 0.4019 158 0.55 0.5858 0.05 -0.5743 1.0133 -0.00635

These estimates can be used to write the equation:

mathach = 9.18 + u0 + (-1.35+u1)*female + 8.05*pracad + e

Where:

e ~ N(0,38.8) u0 ~ N(0,3.87) u1 ~ N(0,0.22) cov(u0,u1)=0

In this model, the interpretation of the coefficient for **pracad** (**b2**) remains unchanged.
The interpretation of the estimate of the intercept (**b0**), as well as its variance **s2u0**, remain
the same as in the interpretation in the random intercept model above. The coefficient for
**female** in the fixed
part of the model (**b1**) can now be interpreted as the average effect of
the variable **female** across schools. Since the effect of the variable **female** is the difference in averages between
males and females (controlling for other variables in the model), **b1** represents
the mean of these average differences between males and females at each school. The coefficient **s2u1**, the variance
of **u1** (the random effect for **female**), is the estimated variance of the effect
of **female** across schools. Larger values of **s2u1** indicate greater
differences in the effect of **female** across schools (keeping in mind the
size of the coefficient itself and that **s2u1** is a variance rather than a
standard deviation). Note that if the predictor variable were continuous, **b1** would represent the mean change
in the outcome for a one unit change in the predictor.

## Model 4: A mixed model with an unstructured level 2 covariance structure

In the model above, we specified that there was no relationship between the
random coefficient for the intercept and the random coefficient for the slope of
the variable **female**. Depending on the situation, this may or may not be a reasonable assumption.
For example, one common situation when studying individual change over time, is individuals who
start out with higher values of the outcome may improve more slowly, that is,
have a flatter slope, than individuals who started out with lower values on the
outcome. This implies a negative relationship between the random effects for the
slope and intercept. An unstructured level 2 covariance matrix allows us to
estimate this type of relationship.

This model is the same as model 3 above except that this model (model 4) includes an
estimate of the covariance between the two random coefficients (**u0** and **
u1**),
this parameter is denoted c01. Accordingly the vector for the variances of the random
effects is now [s2u0, c01, s2u1], representing the variance-covariance matrix:

s2u0 c01 s2u1

The equations for this models are the same as those for model 3, except that cov(u0,u1)=c01, so we will not rewrite them here.

There are two differences between the code for this model (shown below) and the previous
model. Unlike the code for previous models, the code below includes the **parms** statement. By default the
**nlmixed**
procedure uses
1 as the starting value for all parameters. For some models, especially more
complex models, use of the default starting values may lead to slow
convergence, non-convergence, or other estimation problems. The model below
encounters problems when the default starting values are used, so we use some of the
parameter estimates from a simpler model (in this case the random intercept
model) as starting values. (Note that SAS does not require the **parms**
statement for models with an unstructured level 2 covariance matrix, and
that the **parms** statement may be used with any **nlmixed** model.) The second difference between this model
and the previous model is in the random statement, the variances of the random effects include a
parameter (**c01**) for the covariance of **u0**
and **u1**, where the previous model included a 0 to fix this parameter to
zero.

proc nlmixed data="d:datahsb"; parms b0=9.18 b1=-1.35 b2=8.05 s2=38.84 s2u0=3.93; pred = b0 + u0 + (b1+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,c01,s2u1]) subject=id; run;Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1813 0.4133 158 22.21 <.0001 0.05 8.3650 9.9976 0.000242 b1 -1.3523 0.1831 158 -7.38 <.0001 0.05 -1.7140 -0.9906 -0.00028 b2 8.0489 0.6875 158 11.71 <.0001 0.05 6.6910 9.4068 0.000108 s2 38.7238 0.6584 158 58.81 <.0001 0.05 37.4233 40.0243 -0.00022 s2u0 4.4576 0.7591 158 5.87 <.0001 0.05 2.9584 5.9568 0.000044 c01 -0.7000 0.5154 158 -1.36 0.1764 0.05 -1.7179 0.3180 0.000316 s2u1 0.6381 0.5478 158 1.16 0.2458 0.05 -0.4438 1.7200 0.000099

The estimates of the covariances of the random effects are sometimes substantively interesting, such as the discussion of individual change over time discussed above. In other models such covariance may exist, and hence need to be included in the model, but may not of any particular substantive interest. A negative estimate for c01 implies that at schools with higher intercepts (i.e. b0+u0) the difference between males and females (i.e. b1 + u1), that is the effect of gender, will be smaller. Substantively this means that at schools with higher overall math achievement scores (controlling for the effect of gender and proportion of students on an academic track), the difference between males and females tends to be smaller than at schools with lower overall math achievement scores.

## Model 5: A mixed model with a cross level interaction

We may also wish to specify a model in which the effect of a level 1 variable depends
on the value of a level 2 predictor, often called a cross-level interaction. In our example, this would mean
that the effect of being female (**female**) depends on the proportion of students on an
academic track (**pracad**). We might, for example, expect to see that although on average
female students have lower math achievement scores than male students (as
evidenced by the negative coefficients for female in previous models), the
difference between male and female students becomes smaller as the proportion of
students on an academic track increases.

In models 1 and 2, the effect of the variable **female** was described by
a single, fixed coefficient **b1**. In models 3 and 4 we added a random
effect for female, so that the effect of female is described by both a fixed
component (**b1**) and a random component (**u1**). In this model we add a
third term to the effect of female **b3*pracad**, where **b3** is a fixed
coefficient estimated by the model and **pracad** is a level 2 variable. The
model can be written:

mathach = b0 + u0 + b1*female + b3*pracad*female + u1*female + b2*pracad + e = b0 + u0 + (b1 + b3*pracad + u1)*female + b2*pracad + e

Where we assume:

e ~ N(0,s2) u0 ~ N(0,s2u0) u1 ~ N(0,s2u1) cov(u0,u1)=0

Alternatively, we can write this model in the two-level form:

Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u0 b1 = g10 + g11*pracad + u1

The code for this model is the same as for model 3 (i.e. we specify an independent
covariance structure), except that the we have added a new coefficient **b3** to the
equation for **pred**, b3 is the coefficient for the cross level interaction, so that the
total effect of female is equal to **b1+b3*pracad+u1**.

proc nlmixed data="d:datahsb"; pred = b0 + u0 + (b1+b3*pracad+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id; run;Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.2802 0.4443 158 20.89 <.0001 0.05 8.4027 10.1577 0.002315 b1 -1.5489 0.3969 158 -3.90 0.0001 0.05 -2.3328 -0.7650 0.003042 b3 0.4147 0.7416 158 0.56 0.5769 0.05 -1.0502 1.8795 0.001542 b2 7.8543 0.7728 158 10.16 <.0001 0.05 6.3279 9.3807 0.001594 s2 38.7984 0.6586 158 58.91 <.0001 0.05 37.4975 40.0992 0.000105 s2u0 3.8957 0.5616 158 6.94 <.0001 0.05 2.7866 5.0049 -0.00049 s2u1 0.1944 0.4021 158 0.48 0.6295 0.05 -0.5998 0.9885 0.000371

Based on the above estimates, we can write the regression equation as:

mathach = 9.28 + u0 + (-1.55 + 0.41*pracad + u1)*female + 7.85*pracad + e

Where:

e ~ N(0,38.8) u0 ~ N(0,3.9) u1 ~ N(0,0.19) cov(u0,u1)=0

The coefficient **b1** can now be interpreted as the average effect of **female** when
**pracad** is equal to zero. The
coefficient for the cross-level interaction between **female** and **pracad,**
that is **b3**, can be interpreted
as the change in the effect of **female** for a one unit change in **pracad**. For a school where half
of the students are on an academic track (**pracad**=.5), the effect of female is:

-1.55 + 0.41*.5 + u1 = -1.345 + u1

Another way to say this is that the average effect of **female** for a
school where half of the students are on an academic track is -1.345.
Substantively, this means that the higher the proportion of students on an
academic track, the less difference in mean math achievement scores between
males and females.

## See Also

- Raudenbush, Stephen W. & Anthony S. Bryk (2001).
*Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed.*Thousand Oaks, CA: Sage Publications. - Raudenbush, Stephen, Anthony Bryk, Yuk Fai Cheong, & Richard Congdon (2001)
*HLM 5: Hierarchical Linear and Nonlinear Modeling.*Scientific Software International: Lincolnwood, IL. - Skrondal, Anders, & Sophia Rabe-Hesketh(2004).
*Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York, NY: Chapman & Hall/CRC.*