From an OLS model to full mixed models using proc nlmixed

In order to help show the relationships among an OLS, random intercept, and random slope models this page shows a series of models each of which builds on the previous models. Model 1 is a standard ordinary least squares (OLS) regression model. Model 2 adds a random intercept to more appropriately model clustered (multi-level) data. Model 3 adds a random slope. Model 4 adds the covariance between the random intercept and slope. Finally, model 5 includes a cross level interaction.

We use proc nlmixed to fit all of these models because it will not only fit all of these models, but the syntax structure and progression across the models allows us to clearly demonstrate the differences, and similarities, between these models. SAS proc nlmixed is a highly flexible procedure that can be used to run a large variety of models. We do not, however, intend to suggest that you should run these models using nlmixed. In many cases it would be easier to run the first model in proc reg, and the subsequent models in proc mixed. An additional advantage to using mixed over nlmixed is that mixed allows the use of both maximum likelihood estimation and restricted maximum likelihood estimation, nlmixed only uses maximum likelihood estimation.

The dataset for this example includes data on 7185 students in 160 schools. It comes from the HLM manual. The outcome variable mathach is a measure of each student’s math achievement, the predictor variable female is a binary variable equal to one if the student is female and zero otherwise, and the predictor variable pracad is the proportion of students at that school who are on the academic track. The variable id is the school identifier. Note that female varies within school (often called a level 1 variable) while pracad is constant within a school, but varies across schools (often called a level 2 variable). You can download the data used in this example by clicking https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb-1.sas7bdat.

Model 1: An OLS regression

The first model we will run is an ordinary least squares (OLS) regression model where female and pracad predict mathach. In equation form the model is:

mathach = b0 + b1*female + b2*pracad + e

And we assume:

e ~ N(0,s2)

Below is the proc nlmixed syntax corresponding to this specification. Here we define pred as a linear function of female and pracad, mathach is distributed (~) normally with mean equal to pred and variance s2, both of which we wish to estimate (pred is estimated through the coefficients, s2 is estimated based on the model residuals).

proc nlmixed data="d:datahsb";
   pred = b0 + b1*female + b2*pracad;
   model mathach ~ normal(pred,s2);
run;

Towards the end of the output, we see the parameter estimates shown below.

                                       Parameter Estimates

                        Standard
   Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient

   b0           9.3439    0.2023  7185    46.20    <.0001    0.05    8.9474    9.7404   -0.1721
   b1          -1.5037    0.1546  7185    -9.72    <.0001    0.05   -1.8068   -1.2006  -0.16423
   b2           7.8527    0.3073  7185    25.55    <.0001    0.05    7.2503    8.4551  -0.08668
   s2          42.7040    0.7124  7185    59.94    <.0001    0.05   41.3074   44.1006  -0.00432

Based on the above estimates, the model is:

mathach = 9.3439 - 1.5037*female + 7.8527*pracad + e

Where:

e ~ N(0,42.7)

This model is a standard OLS regression model, and the coefficients are interpreted as usual for a regression model. Another way to describe this model is to say that all of the coefficients (b0, b1, and b2) are fixed, only the error term (e) has a variance (s2). This model has totally ignored the nesting structure, assuming all observations are independent of each other. Of course this will not be a valid model for this data, nevertheless, it gives us a starting point.

Model 2: A random intercept model

The problem with the OLS model is that it fails to account for the fact that the observations are not independent, that is, that students are nested within schools. There is often good reason to believe that observations within a cluster are more alike than observations across clusters. In our case, students at one school might, on average, have higher or lower math achievement scores, even after controlling for gender and the percent of students on an academic track. As a result, the residuals for students at that school would be systematically higher or lower than at other schools, violating one of the assumptions of OLS regression. To accommodate this, we can model the intercept as having a mean and a variance. This is the basic multi-level model. In equation form:

mathach = b0 + u + b1*female + b2*pracad + e

Where we assume:

e ~ N(0,s2)
u ~ N(0,s2u)

With the addition of u, we can say that the intercept has two parts, a fixed part, represented by b0, and a random component, represented by u. The parameter b0 is said to be fixed because it does not vary, while the u is said to be random, because it is assumed to take on different values for different level 2 units (i.e. schools). Substantively, b0 is the mean intercept across all schools, while u allows for variation around that mean. Another way to think about this model is that there are two sources of error in our predictions, error that is a result of a school having a mean score that is higher or lower than other schools (u), and error that is a result of individual variation (e). In the OLS model above, we combine these into a single term, e, while in a random intercept model we estimate both the variance of e and the variance of u. The model can also be written in the two level formulation (the models, and the assumptions about e and u, are the same):

Level 1:
mathach = b0 + b1*female + e

Level 2:
b0 = g00 + g01*pracad + u
b1 = g10

The code below fits the random intercept model. Before we can run a random effects model with nlmixed, we need to sort the data by the grouping variable, in this case by id, which identifies which school a student was in. The nlmixed command below differs from the nlmixed command for the OLS model in two ways. First, the definition of pred, the predicted value of mathach, now includes the parameter u, the random coefficient for the intercept. This means that expected mean of mathach now depends on which school the student attends. The second change is the addition of the random statement, which indicates that we wish to add a random effect to our model. The random statement defines u as distributed normally with a mean of zero and constant variance we wish to estimate, denoted s2u using the code u ~ normal(0, s2u). The groups across which we wish to estimate the random effect are identified by subject=id (note that id is the variable that identifies schools in this dataset).

proc sort data="d:datahsb";
   by id;
run;

proc nlmixed data="d:datahsb";
   pred = b0 + u + b1*female + b2*pracad;
   model mathach ~ normal(pred,s2);
   random u ~ normal(0,s2u) subject=id;
run;


                                  Parameter Estimates

                        Standard
   Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient

   b0           9.1800    0.4084   159    22.48    <.0001    0.05    8.3734    9.9867  0.000236
   b1          -1.3450    0.1691   159    -7.96    <.0001    0.05   -1.6789   -1.0111  0.000183
   b2           8.0494    0.6866   159    11.72    <.0001    0.05    6.6934    9.4054  0.000201
   s2          38.8414    0.6554   159    59.26    <.0001    0.05   37.5470   40.1358  0.000831
   s2u          3.9315    0.5444   159     7.22    <.0001    0.05    2.8563    5.0066  -0.00056

Based on the above estimates, the model can be written:

 mathach = 9.18 + u - 1.35*female + 8.05*pracad + e

Where:

e ~ N(0,38.84)
u ~ N(0,3.93)

You may have noticed that the term u remains in the equation for our predicted values of mathach, rather than being replaced with an estimate. This is because we have not modeled the intercepts for each school individually (as one would in a so called fixed effects model) but instead we have modeled their distribution, so that the actual intercept (b0+u) for each school can be estimated based on the model, but is not actually part of the model.

In this model, b0, the intercept, is interpreted as the average (mean) intercept across all schools, while s2u is the estimated variance of the individual schools around that mean. Another way to say this is that s2u is the variance between schools. Larger values of s2u indicate greater differences in intercepts across schools (keeping in mind the size of the intercept itself). We also might want to look at the standard deviation of u, that is, the square root of s2u, rather than the variance (i.e. s2u itself) if we want to get a better sense of how much the intercepts vary across schools. The standard deviation of u is 1.98 (=sqrt(3.93)). The coefficients for the independent variables, that is, b1 and b2, are interpreted in the same manner as before. The estimates for b1 and b2 are slightly different between the two models. This is because this model accounts for the relationship between the mathach scores of students at the same school.

You may have noticed that the variance of e is smaller in this model than in model 1(38.83 vs. 42.7). This is because, as discussed above, in model 1, the error due to individual variation was combined with the error due to variation across schools. In this model (model 2), we include both sources of error in the model, allowing us to distinguish between error at the individual and group levels. Based on the variance of e and u in model 2, we can calculate the proportion of variance that is due to differences across schools, often called the intra-class correlation (icc). The formula for this is s2u/(s2u+s2e), in other words, the variance accounted for by schools, over the total variance. For this model the icc is 3.9315/(38.8414+3.9315) = .0919, so we can say that about 9% of total variance is accounted for by differences in the intercepts across schools.

Model 3: A mixed model with independent covariance structure for the random effects

In addition to allowing the intercept of mathach vary by school, it is possible to allow the relationship between mathach and a predictor variable to vary by school, that is the effect of the predictor variable might be weaker or stronger at different schools. To allow for this we estimate a model with a random coefficient for the variable whose effect we believe may vary by school. Below we estimate a random coefficient for the variable female. Note that it does not make sense to include a random coefficient for the variable pracad in this model, since pracad does not vary within school. The equation for the model looks like this:

mathach = b0 + u0 + b1*female + u1*female + b2*pracad + e
        = b0 + u0 + (b1+u1)*female + b2*pracad + e

Where we assume:

e ~ N(0,s2)
u0 ~ N(0,s2u0)
u1 ~ N(0,s2u1)
cov(u0,u1)=0

The random coefficients are u0 (for the intercept) and u1 (for female). In addition to modeling the variances themselves, we can model the relationship between u0 and u1. We can specify three different relationships between u0 and u1, we can assume that they are unrelated, and fix their covariance to zero; we can assume their covariance is equal to some a non-zero value and fix the covariance to that value (this is rarely done); or we can estimate the covariance u0 and u1 as part of the model. Note that with more than two random effects, additional types of relationships can be modeled. In this model we will assume that u0 and u1 have a covariance of zero, this is sometimes called an independent covariance structure for the random effects.

As before we can also write the model in its two-level form (the models, and the assumptions about e and u, are the same):

Level 1:
mathach = b0 + b1*female + e

Level 2:
b0 = g00 + g01*pracad + u0
b1 = g10 + u1

The code for this model is different from the code for the random intercept model in several ways. First, the equation for pred now includes the term u1. Second, the random statement now includes two random effects (u0 and u1). As in the previous models, the random coefficients are assumed to be normally distributed, their mean and variance are just specified a little differently in the code. Instead of a single value, we now have the vector of means [0,0] one for each of the random effects. Instead of a single parameter for variance we now have a variance-covariance matrix that is shown as the vector [s2u0, 0, s2u1]. The vector [s2u0, 0, s2u1] is equivalent to the lower triangular variance-covariance matrix:

   s2u0
   0    s2u1

proc nlmixed data="d:datahsb";
   pred = b0 + u0 + (b1+u1)*female + b2*pracad;
   model mathach ~ normal(pred,s2);
   random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id;
run;



                                     Parameter Estimates

                        Standard
   Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient

   b0           9.1840    0.4080   158    22.51    <.0001    0.05    8.3781    9.9899   0.00042
   b1          -1.3494    0.1742   158    -7.75    <.0001    0.05   -1.6934   -1.0054  0.004663
   b2           8.0484    0.6870   158    11.72    <.0001    0.05    6.6915    9.4053   -0.0018
   s2          38.8006    0.6587   158    58.91    <.0001    0.05   37.4997   40.1016  0.002626
   s2u0         3.8666    0.5564   158     6.95    <.0001    0.05    2.7676    4.9657  -0.00124
   s2u1         0.2195    0.4019   158     0.55    0.5858    0.05   -0.5743    1.0133  -0.00635

These estimates can be used to write the equation:

mathach = 9.18 + u0 + (-1.35+u1)*female + 8.05*pracad + e

Where:

e ~ N(0,38.8)
u0 ~ N(0,3.87)
u1 ~ N(0,0.22)
cov(u0,u1)=0

In this model, the interpretation of the coefficient for pracad (b2) remains unchanged. The interpretation of the estimate of the intercept (b0), as well as its variance s2u0, remain the same as in the interpretation in the random intercept model above. The coefficient for female in the fixed part of the model (b1) can now be interpreted as the average effect of the variable female across schools. Since the effect of the variable female is the difference in averages between males and females (controlling for other variables in the model), b1 represents the mean of these average differences between males and females at each school. The coefficient s2u1, the variance of u1 (the random effect for female), is the estimated variance of the effect of female across schools. Larger values of s2u1 indicate greater differences in the effect of female across schools (keeping in mind the size of the coefficient itself and that s2u1 is a variance rather than a standard deviation). Note that if the predictor variable were continuous, b1 would represent the mean change in the outcome for a one unit change in the predictor.

Model 4: A mixed model with an unstructured level 2 covariance structure

In the model above, we specified that there was no relationship between the random coefficient for the intercept and the random coefficient for the slope of the variable female. Depending on the situation, this may or may not be a reasonable assumption. For example, one common situation when studying individual change over time, is individuals who start out with higher values of the outcome may improve more slowly, that is, have a flatter slope, than individuals who started out with lower values on the outcome. This implies a negative relationship between the random effects for the slope and intercept. An unstructured level 2 covariance matrix allows us to estimate this type of relationship.

This model is the same as model 3 above except that this model (model 4) includes an estimate of the covariance between the two random coefficients (u0 and u1), this parameter is denoted c01. Accordingly the vector for the variances of the random effects is now [s2u0, c01, s2u1], representing the variance-covariance matrix:

s2u0
c01  s2u1

The equations for this models are the same as those for model 3, except that cov(u0,u1)=c01, so we will not rewrite them here.

There are two differences between the code for this model (shown below) and the previous model. Unlike the code for previous models, the code below includes the parms statement. By default the nlmixed procedure uses 1 as the starting value for all parameters. For some models, especially more complex models, use of the default starting values may lead to slow convergence, non-convergence, or other estimation problems. The model below encounters problems when the default starting values are used, so we use some of the parameter estimates from a simpler model (in this case the random intercept model) as starting values. (Note that SAS does not require the parms statement for models with an unstructured level 2 covariance matrix, and that the parms statement may be used with any nlmixed model.) The second difference between this model and the previous model is in the random statement, the variances of the random effects include a parameter (c01) for the covariance of u0 and u1, where the previous model included a 0 to fix this parameter to zero.

proc nlmixed data="d:datahsb";
   parms b0=9.18 b1=-1.35 b2=8.05 s2=38.84 s2u0=3.93; 
   pred = b0 + u0 + (b1+u1)*female + b2*pracad;
   model mathach ~ normal(pred,s2);
   random u0 u1 ~ normal([0,0],[s2u0,c01,s2u1]) subject=id;
run;


                                  Parameter Estimates

                        Standard
   Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient

   b0           9.1813    0.4133   158    22.21    <.0001    0.05    8.3650    9.9976  0.000242
   b1          -1.3523    0.1831   158    -7.38    <.0001    0.05   -1.7140   -0.9906  -0.00028
   b2           8.0489    0.6875   158    11.71    <.0001    0.05    6.6910    9.4068  0.000108
   s2          38.7238    0.6584   158    58.81    <.0001    0.05   37.4233   40.0243  -0.00022
   s2u0         4.4576    0.7591   158     5.87    <.0001    0.05    2.9584    5.9568  0.000044
   c01         -0.7000    0.5154   158    -1.36    0.1764    0.05   -1.7179    0.3180  0.000316
   s2u1         0.6381    0.5478   158     1.16    0.2458    0.05   -0.4438    1.7200  0.000099

The estimates of the covariances of the random effects are sometimes substantively interesting, such as the discussion of individual change over time discussed above. In other models such covariance may exist, and hence need to be included in the model, but may not of any particular substantive interest. A negative estimate for c01 implies that at schools with higher intercepts (i.e. b0+u0) the difference between males and females (i.e. b1 + u1), that is the effect of gender, will be smaller. Substantively this means that at schools with higher overall math achievement scores (controlling for the effect of gender and proportion of students on an academic track), the difference between males and females tends to be smaller than at schools with lower overall math achievement scores.

Model 5: A mixed model with a cross level interaction

We may also wish to specify a model in which the effect of a level 1 variable depends on the value of a level 2 predictor, often called a cross-level interaction. In our example, this would mean that the effect of being female (female) depends on the proportion of students on an academic track (pracad). We might, for example, expect to see that although on average female students have lower math achievement scores than male students (as evidenced by the negative coefficients for female in previous models), the difference between male and female students becomes smaller as the proportion of students on an academic track increases.

In models 1 and 2, the effect of the variable female was described by a single, fixed coefficient b1. In models 3 and 4 we added a random effect for female, so that the effect of female is described by both a fixed component (b1) and a random component (u1). In this model we add a third term to the effect of female b3*pracad, where b3 is a fixed coefficient estimated by the model and pracad is a level 2 variable. The model can be written:

mathach = b0 + u0 + b1*female + b3*pracad*female + u1*female + b2*pracad + e
        = b0 + u0 + (b1 + b3*pracad + u1)*female + b2*pracad + e

Where we assume:

e ~ N(0,s2)
u0 ~ N(0,s2u0)
u1 ~ N(0,s2u1)
cov(u0,u1)=0

Alternatively, we can write this model in the two-level form:

Level 1:
mathach = b0 + b1*female + e

Level 2:
b0 = g00 + g01*pracad + u0
b1 = g10 + g11*pracad + u1

The code for this model is the same as for model 3 (i.e. we specify an independent covariance structure), except that the we have added a new coefficient b3 to the equation for pred, b3 is the coefficient for the cross level interaction, so that the total effect of female is equal to b1+b3*pracad+u1.

proc nlmixed data="d:datahsb";
   pred = b0 + u0 + (b1+b3*pracad+u1)*female + b2*pracad;
   model mathach ~ normal(pred,s2);
   random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id;
run;



                                       Parameter Estimates

                        Standard
   Parameter  Estimate     Error    DF  t Value  Pr > |t|   Alpha     Lower     Upper  Gradient

   b0           9.2802    0.4443   158    20.89    <.0001    0.05    8.4027   10.1577  0.002315
   b1          -1.5489    0.3969   158    -3.90    0.0001    0.05   -2.3328   -0.7650  0.003042
   b3           0.4147    0.7416   158     0.56    0.5769    0.05   -1.0502    1.8795  0.001542
   b2           7.8543    0.7728   158    10.16    <.0001    0.05    6.3279    9.3807  0.001594
   s2          38.7984    0.6586   158    58.91    <.0001    0.05   37.4975   40.0992  0.000105
   s2u0         3.8957    0.5616   158     6.94    <.0001    0.05    2.7866    5.0049  -0.00049
   s2u1         0.1944    0.4021   158     0.48    0.6295    0.05   -0.5998    0.9885  0.000371

Based on the above estimates, we can write the regression equation as:

mathach = 9.28 + u0 + (-1.55 + 0.41*pracad + u1)*female + 7.85*pracad + e

Where:

e ~ N(0,38.8)
u0 ~ N(0,3.9)
u1 ~ N(0,0.19)
cov(u0,u1)=0

The coefficient b1 can now be interpreted as the average effect of female when pracad is equal to zero. The coefficient for the cross-level interaction between female and pracad, that is b3, can be interpreted as the change in the effect of female for a one unit change in pracad. For a school where half of the students are on an academic track (pracad=.5), the effect of female is:

 -1.55 + 0.41*.5 + u1 = -1.345 + u1

Another way to say this is that the average effect of female for a school where half of the students are on an academic track is -1.345. Substantively, this means that the higher the proportion of students on an academic track, the less difference in mean math achievement scores between males and females.

From an OLS model to full mixed models using proc nlmixed | SAS FAQ

Model 1: An OLS regression

Model 2: A random intercept model

Model 3: A mixed model with independent covariance structure for the random effects

Model 4: A mixed model with an unstructured level 2 covariance structure

Model 5: A mixed model with a cross level interaction

See Also