In order to help show the relationships among an OLS, random intercept, and random slope models this page shows a series of models each of which builds on the previous models. Model 1 is a standard ordinary least squares (OLS) regression model. Model 2 adds a random intercept to more appropriately model clustered (multi-level) data. Model 3 adds a random slope. Model 4 adds the covariance between the random intercept and slope. Finally, model 5 includes a cross level interaction.
We use proc nlmixed to fit all of these models because it will not only fit all of these models, but the syntax structure and progression across the models allows us to clearly demonstrate the differences, and similarities, between these models. SAS proc nlmixed is a highly flexible procedure that can be used to run a large variety of models. We do not, however, intend to suggest that you should run these models using nlmixed. In many cases it would be easier to run the first model in proc reg, and the subsequent models in proc mixed. An additional advantage to using mixed over nlmixed is that mixed allows the use of both maximum likelihood estimation and restricted maximum likelihood estimation, nlmixed only uses maximum likelihood estimation.
The dataset for this example includes data on 7185 students in 160 schools. It comes from the HLM manual. The outcome variable mathach is a measure of each student’s math achievement, the predictor variable female is a binary variable equal to one if the student is female and zero otherwise, and the predictor variable pracad is the proportion of students at that school who are on the academic track. The variable id is the school identifier. Note that female varies within school (often called a level 1 variable) while pracad is constant within a school, but varies across schools (often called a level 2 variable). You can download the data used in this example by clicking https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb-1.sas7bdat.
Model 1: An OLS regression
The first model we will run is an ordinary least squares (OLS) regression model where female and pracad predict mathach. In equation form the model is:
mathach = b0 + b1*female + b2*pracad + e
And we assume:
e ~ N(0,s2)
Below is the proc nlmixed syntax corresponding to this specification. Here we define pred as a linear function of female and pracad, mathach is distributed (~) normally with mean equal to pred and variance s2, both of which we wish to estimate (pred is estimated through the coefficients, s2 is estimated based on the model residuals).
proc nlmixed data="d:datahsb"; pred = b0 + b1*female + b2*pracad; model mathach ~ normal(pred,s2); run;
Towards the end of the output, we see the parameter estimates shown below.
Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.3439 0.2023 7185 46.20 <.0001 0.05 8.9474 9.7404 -0.1721 b1 -1.5037 0.1546 7185 -9.72 <.0001 0.05 -1.8068 -1.2006 -0.16423 b2 7.8527 0.3073 7185 25.55 <.0001 0.05 7.2503 8.4551 -0.08668 s2 42.7040 0.7124 7185 59.94 <.0001 0.05 41.3074 44.1006 -0.00432
Based on the above estimates, the model is:
mathach = 9.3439 - 1.5037*female + 7.8527*pracad + e
Where:
e ~ N(0,42.7)
This model is a standard OLS regression model, and the coefficients are interpreted as usual for a regression model. Another way to describe this model is to say that all of the coefficients (b0, b1, and b2) are fixed, only the error term (e) has a variance (s2). This model has totally ignored the nesting structure, assuming all observations are independent of each other. Of course this will not be a valid model for this data, nevertheless, it gives us a starting point.
Model 2: A random intercept model
The problem with the OLS model is that it fails to account for the fact that the observations are not independent, that is, that students are nested within schools. There is often good reason to believe that observations within a cluster are more alike than observations across clusters. In our case, students at one school might, on average, have higher or lower math achievement scores, even after controlling for gender and the percent of students on an academic track. As a result, the residuals for students at that school would be systematically higher or lower than at other schools, violating one of the assumptions of OLS regression. To accommodate this, we can model the intercept as having a mean and a variance. This is the basic multi-level model. In equation form:
mathach = b0 + u + b1*female + b2*pracad + e
Where we assume:
e ~ N(0,s2) u ~ N(0,s2u)
With the addition of u, we can say that the intercept has two parts, a fixed part, represented by b0, and a random component, represented by u. The parameter b0 is said to be fixed because it does not vary, while the u is said to be random, because it is assumed to take on different values for different level 2 units (i.e. schools). Substantively, b0 is the mean intercept across all schools, while u allows for variation around that mean. Another way to think about this model is that there are two sources of error in our predictions, error that is a result of a school having a mean score that is higher or lower than other schools (u), and error that is a result of individual variation (e). In the OLS model above, we combine these into a single term, e, while in a random intercept model we estimate both the variance of e and the variance of u. The model can also be written in the two level formulation (the models, and the assumptions about e and u, are the same):
Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u b1 = g10
The code below fits the random intercept model. Before we can run a random effects model with nlmixed, we need to sort the data by the grouping variable, in this case by id, which identifies which school a student was in. The nlmixed command below differs from the nlmixed command for the OLS model in two ways. First, the definition of pred, the predicted value of mathach, now includes the parameter u, the random coefficient for the intercept. This means that expected mean of mathach now depends on which school the student attends. The second change is the addition of the random statement, which indicates that we wish to add a random effect to our model. The random statement defines u as distributed normally with a mean of zero and constant variance we wish to estimate, denoted s2u using the code u ~ normal(0, s2u). The groups across which we wish to estimate the random effect are identified by subject=id (note that id is the variable that identifies schools in this dataset).
proc sort data="d:datahsb"; by id; run; proc nlmixed data="d:datahsb"; pred = b0 + u + b1*female + b2*pracad; model mathach ~ normal(pred,s2); random u ~ normal(0,s2u) subject=id; run; Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1800 0.4084 159 22.48 <.0001 0.05 8.3734 9.9867 0.000236 b1 -1.3450 0.1691 159 -7.96 <.0001 0.05 -1.6789 -1.0111 0.000183 b2 8.0494 0.6866 159 11.72 <.0001 0.05 6.6934 9.4054 0.000201 s2 38.8414 0.6554 159 59.26 <.0001 0.05 37.5470 40.1358 0.000831 s2u 3.9315 0.5444 159 7.22 <.0001 0.05 2.8563 5.0066 -0.00056
Based on the above estimates, the model can be written:
mathach = 9.18 + u - 1.35*female + 8.05*pracad + e
Where:
e ~ N(0,38.84) u ~ N(0,3.93)
You may have noticed that the term u remains in the equation for our predicted values of mathach, rather than being replaced with an estimate. This is because we have not modeled the intercepts for each school individually (as one would in a so called fixed effects model) but instead we have modeled their distribution, so that the actual intercept (b0+u) for each school can be estimated based on the model, but is not actually part of the model.
In this model, b0, the intercept, is interpreted as the average (mean) intercept across all schools, while s2u is the estimated variance of the individual schools around that mean. Another way to say this is that s2u is the variance between schools. Larger values of s2u indicate greater differences in intercepts across schools (keeping in mind the size of the intercept itself). We also might want to look at the standard deviation of u, that is, the square root of s2u, rather than the variance (i.e. s2u itself) if we want to get a better sense of how much the intercepts vary across schools. The standard deviation of u is 1.98 (=sqrt(3.93)). The coefficients for the independent variables, that is, b1 and b2, are interpreted in the same manner as before. The estimates for b1 and b2 are slightly different between the two models. This is because this model accounts for the relationship between the mathach scores of students at the same school.
You may have noticed that the variance of e is smaller in this model than in model 1(38.83 vs. 42.7). This is because, as discussed above, in model 1, the error due to individual variation was combined with the error due to variation across schools. In this model (model 2), we include both sources of error in the model, allowing us to distinguish between error at the individual and group levels. Based on the variance of e and u in model 2, we can calculate the proportion of variance that is due to differences across schools, often called the intra-class correlation (icc). The formula for this is s2u/(s2u+s2e), in other words, the variance accounted for by schools, over the total variance. For this model the icc is 3.9315/(38.8414+3.9315) = .0919, so we can say that about 9% of total variance is accounted for by differences in the intercepts across schools.
Model 3: A mixed model with independent covariance structure for the random effects
In addition to allowing the intercept of mathach vary by school, it is possible to allow the relationship between mathach and a predictor variable to vary by school, that is the effect of the predictor variable might be weaker or stronger at different schools. To allow for this we estimate a model with a random coefficient for the variable whose effect we believe may vary by school. Below we estimate a random coefficient for the variable female. Note that it does not make sense to include a random coefficient for the variable pracad in this model, since pracad does not vary within school. The equation for the model looks like this:
mathach = b0 + u0 + b1*female + u1*female + b2*pracad + e = b0 + u0 + (b1+u1)*female + b2*pracad + e
Where we assume:
e ~ N(0,s2) u0 ~ N(0,s2u0) u1 ~ N(0,s2u1) cov(u0,u1)=0
The random coefficients are u0 (for the intercept) and u1 (for female). In addition to modeling the variances themselves, we can model the relationship between u0 and u1. We can specify three different relationships between u0 and u1, we can assume that they are unrelated, and fix their covariance to zero; we can assume their covariance is equal to some a non-zero value and fix the covariance to that value (this is rarely done); or we can estimate the covariance u0 and u1 as part of the model. Note that with more than two random effects, additional types of relationships can be modeled. In this model we will assume that u0 and u1 have a covariance of zero, this is sometimes called an independent covariance structure for the random effects.
As before we can also write the model in its two-level form (the models, and the assumptions about e and u, are the same):
Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u0 b1 = g10 + u1
The code for this model is different from the code for the random intercept model in several ways. First, the equation for pred now includes the term u1. Second, the random statement now includes two random effects (u0 and u1). As in the previous models, the random coefficients are assumed to be normally distributed, their mean and variance are just specified a little differently in the code. Instead of a single value, we now have the vector of means [0,0] one for each of the random effects. Instead of a single parameter for variance we now have a variance-covariance matrix that is shown as the vector [s2u0, 0, s2u1]. The vector [s2u0, 0, s2u1] is equivalent to the lower triangular variance-covariance matrix:
s2u0 0 s2u1
proc nlmixed data="d:datahsb"; pred = b0 + u0 + (b1+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id; run; Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1840 0.4080 158 22.51 <.0001 0.05 8.3781 9.9899 0.00042 b1 -1.3494 0.1742 158 -7.75 <.0001 0.05 -1.6934 -1.0054 0.004663 b2 8.0484 0.6870 158 11.72 <.0001 0.05 6.6915 9.4053 -0.0018 s2 38.8006 0.6587 158 58.91 <.0001 0.05 37.4997 40.1016 0.002626 s2u0 3.8666 0.5564 158 6.95 <.0001 0.05 2.7676 4.9657 -0.00124 s2u1 0.2195 0.4019 158 0.55 0.5858 0.05 -0.5743 1.0133 -0.00635
These estimates can be used to write the equation:
mathach = 9.18 + u0 + (-1.35+u1)*female + 8.05*pracad + e
Where:
e ~ N(0,38.8) u0 ~ N(0,3.87) u1 ~ N(0,0.22) cov(u0,u1)=0
In this model, the interpretation of the coefficient for pracad (b2) remains unchanged. The interpretation of the estimate of the intercept (b0), as well as its variance s2u0, remain the same as in the interpretation in the random intercept model above. The coefficient for female in the fixed part of the model (b1) can now be interpreted as the average effect of the variable female across schools. Since the effect of the variable female is the difference in averages between males and females (controlling for other variables in the model), b1 represents the mean of these average differences between males and females at each school. The coefficient s2u1, the variance of u1 (the random effect for female), is the estimated variance of the effect of female across schools. Larger values of s2u1 indicate greater differences in the effect of female across schools (keeping in mind the size of the coefficient itself and that s2u1 is a variance rather than a standard deviation). Note that if the predictor variable were continuous, b1 would represent the mean change in the outcome for a one unit change in the predictor.
Model 4: A mixed model with an unstructured level 2 covariance structure
In the model above, we specified that there was no relationship between the random coefficient for the intercept and the random coefficient for the slope of the variable female. Depending on the situation, this may or may not be a reasonable assumption. For example, one common situation when studying individual change over time, is individuals who start out with higher values of the outcome may improve more slowly, that is, have a flatter slope, than individuals who started out with lower values on the outcome. This implies a negative relationship between the random effects for the slope and intercept. An unstructured level 2 covariance matrix allows us to estimate this type of relationship.
This model is the same as model 3 above except that this model (model 4) includes an estimate of the covariance between the two random coefficients (u0 and u1), this parameter is denoted c01. Accordingly the vector for the variances of the random effects is now [s2u0, c01, s2u1], representing the variance-covariance matrix:
s2u0 c01 s2u1
The equations for this models are the same as those for model 3, except that cov(u0,u1)=c01, so we will not rewrite them here.
There are two differences between the code for this model (shown below) and the previous model. Unlike the code for previous models, the code below includes the parms statement. By default the nlmixed procedure uses 1 as the starting value for all parameters. For some models, especially more complex models, use of the default starting values may lead to slow convergence, non-convergence, or other estimation problems. The model below encounters problems when the default starting values are used, so we use some of the parameter estimates from a simpler model (in this case the random intercept model) as starting values. (Note that SAS does not require the parms statement for models with an unstructured level 2 covariance matrix, and that the parms statement may be used with any nlmixed model.) The second difference between this model and the previous model is in the random statement, the variances of the random effects include a parameter (c01) for the covariance of u0 and u1, where the previous model included a 0 to fix this parameter to zero.
proc nlmixed data="d:datahsb"; parms b0=9.18 b1=-1.35 b2=8.05 s2=38.84 s2u0=3.93; pred = b0 + u0 + (b1+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,c01,s2u1]) subject=id; run; Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.1813 0.4133 158 22.21 <.0001 0.05 8.3650 9.9976 0.000242 b1 -1.3523 0.1831 158 -7.38 <.0001 0.05 -1.7140 -0.9906 -0.00028 b2 8.0489 0.6875 158 11.71 <.0001 0.05 6.6910 9.4068 0.000108 s2 38.7238 0.6584 158 58.81 <.0001 0.05 37.4233 40.0243 -0.00022 s2u0 4.4576 0.7591 158 5.87 <.0001 0.05 2.9584 5.9568 0.000044 c01 -0.7000 0.5154 158 -1.36 0.1764 0.05 -1.7179 0.3180 0.000316 s2u1 0.6381 0.5478 158 1.16 0.2458 0.05 -0.4438 1.7200 0.000099
The estimates of the covariances of the random effects are sometimes substantively interesting, such as the discussion of individual change over time discussed above. In other models such covariance may exist, and hence need to be included in the model, but may not of any particular substantive interest. A negative estimate for c01 implies that at schools with higher intercepts (i.e. b0+u0) the difference between males and females (i.e. b1 + u1), that is the effect of gender, will be smaller. Substantively this means that at schools with higher overall math achievement scores (controlling for the effect of gender and proportion of students on an academic track), the difference between males and females tends to be smaller than at schools with lower overall math achievement scores.
Model 5: A mixed model with a cross level interaction
We may also wish to specify a model in which the effect of a level 1 variable depends on the value of a level 2 predictor, often called a cross-level interaction. In our example, this would mean that the effect of being female (female) depends on the proportion of students on an academic track (pracad). We might, for example, expect to see that although on average female students have lower math achievement scores than male students (as evidenced by the negative coefficients for female in previous models), the difference between male and female students becomes smaller as the proportion of students on an academic track increases.
In models 1 and 2, the effect of the variable female was described by a single, fixed coefficient b1. In models 3 and 4 we added a random effect for female, so that the effect of female is described by both a fixed component (b1) and a random component (u1). In this model we add a third term to the effect of female b3*pracad, where b3 is a fixed coefficient estimated by the model and pracad is a level 2 variable. The model can be written:
mathach = b0 + u0 + b1*female + b3*pracad*female + u1*female + b2*pracad + e = b0 + u0 + (b1 + b3*pracad + u1)*female + b2*pracad + e
Where we assume:
e ~ N(0,s2) u0 ~ N(0,s2u0) u1 ~ N(0,s2u1) cov(u0,u1)=0
Alternatively, we can write this model in the two-level form:
Level 1: mathach = b0 + b1*female + e Level 2: b0 = g00 + g01*pracad + u0 b1 = g10 + g11*pracad + u1
The code for this model is the same as for model 3 (i.e. we specify an independent covariance structure), except that the we have added a new coefficient b3 to the equation for pred, b3 is the coefficient for the cross level interaction, so that the total effect of female is equal to b1+b3*pracad+u1.
proc nlmixed data="d:datahsb"; pred = b0 + u0 + (b1+b3*pracad+u1)*female + b2*pracad; model mathach ~ normal(pred,s2); random u0 u1 ~ normal([0,0],[s2u0,0,s2u1]) subject=id; run; Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 9.2802 0.4443 158 20.89 <.0001 0.05 8.4027 10.1577 0.002315 b1 -1.5489 0.3969 158 -3.90 0.0001 0.05 -2.3328 -0.7650 0.003042 b3 0.4147 0.7416 158 0.56 0.5769 0.05 -1.0502 1.8795 0.001542 b2 7.8543 0.7728 158 10.16 <.0001 0.05 6.3279 9.3807 0.001594 s2 38.7984 0.6586 158 58.91 <.0001 0.05 37.4975 40.0992 0.000105 s2u0 3.8957 0.5616 158 6.94 <.0001 0.05 2.7866 5.0049 -0.00049 s2u1 0.1944 0.4021 158 0.48 0.6295 0.05 -0.5998 0.9885 0.000371
Based on the above estimates, we can write the regression equation as:
mathach = 9.28 + u0 + (-1.55 + 0.41*pracad + u1)*female + 7.85*pracad + e
Where:
e ~ N(0,38.8) u0 ~ N(0,3.9) u1 ~ N(0,0.19) cov(u0,u1)=0
The coefficient b1 can now be interpreted as the average effect of female when pracad is equal to zero. The coefficient for the cross-level interaction between female and pracad, that is b3, can be interpreted as the change in the effect of female for a one unit change in pracad. For a school where half of the students are on an academic track (pracad=.5), the effect of female is:
-1.55 + 0.41*.5 + u1 = -1.345 + u1
Another way to say this is that the average effect of female for a school where half of the students are on an academic track is -1.345. Substantively, this means that the higher the proportion of students on an academic track, the less difference in mean math achievement scores between males and females.
See Also
- Raudenbush, Stephen W. & Anthony S. Bryk (2001). Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed.Thousand Oaks, CA: Sage Publications.
- Raudenbush, Stephen, Anthony Bryk, Yuk Fai Cheong, & Richard Congdon (2001) HLM 5: Hierarchical Linear and Nonlinear Modeling. Scientific Software International: Lincolnwood, IL.
- Skrondal, Anders, & Sophia Rabe-Hesketh(2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York, NY: Chapman & Hall/CRC.