Regression models attempt to explain the variation in an outcome (dependent variable) through its relationship with a set of one or more predictors (independent variables).
Although a regression coefficient quantifies the strength of the relationship between outcome and predictor (effect size), it is expressed in units specific to the predictor, so direct comparison to other predictors’ coefficients is not straightforward.
Standardized effect sizes, like Cohen’s \(d\) and \(R^2\), are unitless and thus permit direct comparison between predictors measured in different units, even acros different studies.
Multilevel regression models are commonly used to model data that are hierarchically structured, which are usually characterized by non-independence of observations collected within aggregate units.
However, currently no set of standardized effect size measures have been widely adopted by researchers using multilevel models.
This workshop discusses a new set of \(R^2\) measures introduced by Rights and Sterba in their 2019 article ” Quantifying Explained Variance in Multilevel Models: An Integrative Framework for Defining R-Squared Measures”.
The authors of the r2mlm
package have themselves said
that using their new \(R^2\) measures
requires a solid understanding of multilevel models (Shaw et al.,
2023).
To this end, we will discuss:
r2mlm
packageStatistical models like linear regression use a set of predictors to explain the variation in the outcome.
Later in this workshop we will learn how to quantify how much variation has been explained, but first we need to understand how variation is measured.
Imagine we have an outcome, \(y_i\), measured for units \(i=1,...,N\).
One common way to quantify variation is to measure deviations from the mean, \(y_i - \bar{y}\), where
\[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}{y_i}\]
The sum of the squared deviations from the mean, or sums of squares, is a simple measure of variability.
For variable \(y_i\), the total sums of squares (TSS) is:
\[TSS=\sum_{i=1}^{n}{(y_i-\bar{y})^2}\]
Sums of squares always increase when we add data (because squared deviations are always non-negative), so we might want a measure of variability that we can use to compare variables with different sample sizes.
Dividing the TSS for \(y_i\) by \(n\) yields the variance of \(y_i\), \(\sigma_y^2\):
\[\begin{align}\sigma_y^2 &= \frac{TSS}{n} \\ &=\frac{1}{n}\sum_{i=1}^{n}{(y_i-\bar{y})^2}\end{align}\]
The variance is thus the average squared deviation from the mean and will thus not necessarily increase as data are added.
The formula above can be used for a population. If our \(y_i\) is instead a sample from a population and we wish to estimate \(\sigma_y^2\) in the population, we should divide TSS by \(n-1\) instead of \(n\).
\[\hat{\sigma}_y^2 = \frac{1}{n-1}\sum_{i=1}^{n}{(y_i-\bar{y})^2}\] \(\hat{}\) signifies that we are estimating a population parameter from a sample (and above, \(\bar{y} = \hat\mu_y\)).
When estimating a population parameter from a sample of size \(n\), we should subtract \(1\) from \(n\) for each parameter estimated as part of the calculation. In estimating the variance of \(y_i\), we estimated the population mean of \(y_i\) with the sample mean \(\bar{y}\), so we subtract one yielding \(n-1\) degrees of freedom.
The standard deviation is the square root of the variance, or \(\sigma_y=\sqrt{\sigma_y^2}\).
# deviations from mean
small$dev <- small$y - mean(small$y)
# sum of squared deviations
TSS <- sum(small$dev^2)
TSS
## [1] 1198.444
# sample size
n <- nrow(small)
# variance of y
TSS/n
## [1] 66.58025
# estimate of variance in population
TSS/(n-1)
## [1] 70.49673
# var() function divides by n-1
var(small$y)
## [1] 70.49673
Now that know how to measure the variability of \(y_i\), we can use linear regression to explain the variability with relevant predictors.
Below we model \(y_i\) to be predicted by a single predictor, \(x_i\):
\[y_i = \beta_0 + \beta_1x_i + \epsilon_i\] where
In linear regression, the errors are assumed to be normally distributed with a zero mean and variance \(\sigma_{\epsilon}^2\):
\[\epsilon_i \sim N(0, \sigma_{\epsilon}^2)\].
As we add relevant predictors to the model, we expect the errors to decrease in magnitude, reflected by a smaller \(\sigma_{\epsilon}^2\).
# use lm() for linear regression
m <- lm(y ~ x, data=small)
summary(m)
##
## Call:
## lm(formula = y ~ x, data = small)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.113 -7.009 1.564 5.941 11.244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.1008 11.6468 2.842 0.0118 *
## x 0.3574 0.2353 1.519 0.1482
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.091 on 16 degrees of freedom
## Multiple R-squared: 0.1261, Adjusted R-squared: 0.07144
## F-statistic: 2.308 on 1 and 16 DF, p-value: 0.1482
After fitting the model, we can calculate the predicted outcome for each observations, \(\hat{y}_i\), based on the estimated coefficients:
\[\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_i\]
For a single continuous predictor, the predictions \(\hat{y}_i\) will lie on a line:
Given no predictors, the intercept \(\beta_0\) is the only coefficient estimated in a linear regression:
\[y_i = \beta_0 + \epsilon_i\] Its estimate will equal the sample mean:
\[\hat{\beta}_0 = \bar{y}\]
Therefore, all predictions will be the sample mean:
\[\begin{align}\hat{y}_i &= \hat{\beta_0} \\ &= \bar{y} \end{align}\]
This no-intercept model serves as starting model where none of the variability in \(y_i\) is explained.
As we add predictors to the model, we expect the predictions \(\hat{y}_i\) on average to move away from the sample mean \(\bar{y}\) and closer to their observed values \(y_i\).
The explained sums of squares (ESS) quantifies how much the predicted values \(\hat{y}_i\) from a regression model have moved away from \(\bar{y}\):
\[ESS = \sum_{i=1}^n{(\hat{y}_i-\bar{y})^2}\]
As it contains no explanatory predictors, the ESS for the intercept-only model is 0:
\[\begin{align}ESS_{null} &= \sum_{i=1}^n{(\hat{y}_i-\bar{y})^2} \\ &= \sum_{i=1}^n{(\bar{y}-\bar{y})^2} \\ &= 0 \end{align}\]
On the other hand, if our model can perfectly predict the outcome such that \(\hat{y}_i = y_i\), then \(ESS=TSS\):
\[\begin{align}ESS_{perfect} &= \sum_{i=1}^n{(\hat{y}_i-\bar{y})^2} \\ &= \sum_{i=1}^n{(y_i-\bar{y})^2} \\ &= TSS \end{align}\]
Generally, though, \(0 < ESS < TSS\)
#TSS
TSS <- sum((small$y - mean(small$y))^2)
TSS
## [1] 1198.444
#ESS
#fitted() returns predicted values for observed data
small$yhat <- fitted(m)
ESS <- sum((small$yhat - mean(small$y))^2)
ESS
## [1] 151.0758
We cannot directly observe or measure the errors of a linear regression model, \(\epsilon_i\), but we can estimate them from the residuals, \(\hat{\epsilon}_i\), the difference between the observed and predicted values:
\[\hat{\epsilon}_i = y_i - \hat{y}_i\]
Residuals represent the variation in the outcome that cannot be explained by the model.
The residual sums of squares quantifies the amount of unexplained variation:
\[\begin{align} RSS &= \sum_{i=1}^n{\hat{\epsilon}_i} \\ &= \sum_{i=1}^n{(y_i - \hat{y}_i)^2} \end{align}\]
# RSS
# residuals() returns residuals of model
RSS <- sum(residuals(m)^2)
RSS
## [1] 1047.369
Dividing RSS by \(n-p\) give the residual variance, \(\hat{\sigma}_\epsilon^2\), where \(p\) is the number of coefficients estimated in the model:
\[\begin{align} \hat{\sigma}_\epsilon^2 &= \frac{RSS}{n-p} \\ &= \frac{\sum_{i=1}^n{\hat{\epsilon}_i}}{n-p} \end{align}\]
Because the \(p\) coefficients are used to calculate the residuals and their variance \(\hat{\sigma}_e^2\), the degrees of freedom are \(n-p\).
# Residual standard error is square root of residual variance
summary(m)
##
## Call:
## lm(formula = y ~ x, data = small)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.113 -7.009 1.564 5.941 11.244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.1008 11.6468 2.842 0.0118 *
## x 0.3574 0.2353 1.519 0.1482
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.091 on 16 degrees of freedom
## Multiple R-squared: 0.1261, Adjusted R-squared: 0.07144
## F-statistic: 2.308 on 1 and 16 DF, p-value: 0.1482
In regression, we can partition total variation into explained and residual variation:
\[TSS = ESS + RSS\]
As we add relevant predictors to the model, ESS grows while RSS shrinks.
Make sure example data has no duplicate x-values
\(R^2\) expresses the proportion of \(TSS\) that is \(ESS\):
\[R^2 = \frac{ESS}{TSS}\]
Because \(TSS=ESS+RSS\)
\[R^2 = 1 - \frac{RSS}{TSS}\] \(R^2\) is thus interpreted as the proportion of outcome variation explained by the model predictors.
# R^2 reported in output
summary(m)
##
## Call:
## lm(formula = y ~ x, data = small)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.113 -7.009 1.564 5.941 11.244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.1008 11.6468 2.842 0.0118 *
## x 0.3574 0.2353 1.519 0.1482
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.091 on 16 degrees of freedom
## Multiple R-squared: 0.1261, Adjusted R-squared: 0.07144
## F-statistic: 2.308 on 1 and 16 DF, p-value: 0.1482
# confirming calculation
ESS/TSS
## [1] 0.1260599
Previously, we showed that \(ESS\) for the intercept-only model is 0. What will \(R^2\) for this model be?
# R^2 for intercept-only model
summary(lm(y ~ 1, data=small))$r.squared
## [1] 0
\[R^2=corr(y_i,\hat{y}_i)\]
# equals R^2
cor(small$y, small$yhat)^2
## [1] 0.1260599
# plot of observed vs predicted outcome
ggplot(small, aes(x=y, y=yhat)) +
geom_point() +
labs(x=expression(y[i]), y=expression(hat(y)[i]),
title=expression(paste("Observed and predicted scores become more correlated as ", R^2, " increases")))
What determines how much variance is explained by predictor \(x_i\)?
Holding everything else constant, variance explained increases as:
# height data
height[4:10,]
## height male before_july disease
## 4 70.33877 0 1 1
## 5 62.55333 0 0 0
## 6 64.84237 0 1 0
## 7 61.73652 0 0 0
## 8 64.03135 0 1 0
## 9 64.42657 0 0 0
## 10 66.27407 0 1 0
# male and disease have similar coefficients
# before_july has nearly zero effect
summary(lm(height ~ male + before_july + disease, data=height))
##
## Call:
## lm(formula = height ~ male + before_july + disease, data = height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6396 -1.7393 0.0938 1.4779 6.9888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64.2630 0.2760 232.816 < 2e-16 ***
## male 4.8323 0.3165 15.266 < 2e-16 ***
## before_july 0.1727 0.3165 0.546 0.586
## disease 4.9490 0.8077 6.128 4.81e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.238 on 196 degrees of freedom
## Multiple R-squared: 0.5802, Adjusted R-squared: 0.5738
## F-statistic: 90.3 on 3 and 196 DF, p-value: < 2.2e-16
Below we graph the individual contributions to \(R^2\) by predictor:
\(R^2\) does not account for the complexity of the model (number of predictors)
Previously we saw that \(\hat{\sigma}_y^2\) and \(\hat{\sigma}_e^2\) are unbiased estimators of the population variance of \(y_i\) and of the population error variance, respectively.
Adjusted \(R^2\) uses \(\hat{\sigma}_e^2\) instead of \(RSS\) to represent unexplained variability:
\[\begin{align} R^2 &= 1 - \frac{\hat{\sigma}_\epsilon^2}{\hat{\sigma}_y^2} \\ &= 1 - \frac{\frac{RSS}{n-p}}{\frac{TSS}{n-1}} \end{align}\]
# model without before_july
summary(lm(height ~ male + disease, data=height))
##
## Call:
## lm(formula = height ~ male + disease, data = height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5532 -1.7575 0.0699 1.4502 6.9024
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64.3493 0.2257 285.059 < 2e-16 ***
## male 4.8323 0.3160 15.294 < 2e-16 ***
## disease 4.9490 0.8062 6.139 4.5e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.234 on 197 degrees of freedom
## Multiple R-squared: 0.5796, Adjusted R-squared: 0.5753
## F-statistic: 135.8 on 2 and 197 DF, p-value: < 2.2e-16
# model with before_july has higher R^2 but lower Adjusted R^2
summary(lm(height ~ male + disease + before_july, data=height))
##
## Call:
## lm(formula = height ~ male + disease + before_july, data = height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6396 -1.7393 0.0938 1.4779 6.9888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64.2630 0.2760 232.816 < 2e-16 ***
## male 4.8323 0.3165 15.266 < 2e-16 ***
## disease 4.9490 0.8077 6.128 4.81e-09 ***
## before_july 0.1727 0.3165 0.546 0.586
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.238 on 196 degrees of freedom
## Multiple R-squared: 0.5802, Adjusted R-squared: 0.5738
## F-statistic: 90.3 on 3 and 196 DF, p-value: < 2.2e-16
Multilevel data are hierarchically structured where sampled observations (level-1) can be aggregated into higher level units (level-2).
Multilevel data arise when:
We generally assume that level-2 units are independent of each other but that level-1 observations within the same level-2 unit are not independent, violating the independence assumption of typical single level models.
Note: Henceforth, we will refer to multilevel data arising from clustered sampling but all concepts discussed apply equally to repeated measures data as well.
We will discuss multilevel model with a simulated dataset of worker stress, with 15 companies sampled and 10 employess within each company sampled.
Variables
company_id
employee_id
stress
: the outcome, ranges 1-100salary
: employee’s salary (level-1) in thousands of
dollarspct_open
: percent of company’s position that are
unfilled (level-2)# empstress data
head(empstress)
## company_id employee_id salary pct_open stress
## 1 1 1 76.594 11.082 48
## 2 1 2 80.259 11.082 41
## 3 1 3 67.790 11.082 43
## 4 1 4 82.514 11.082 47
## 5 1 5 76.245 11.082 54
## 6 1 6 78.169 11.082 41
The multilevel model explains variation in outcome \(y_{ij}\), measured for individual \(i=1,...,n_j\) within level-2 unit \(j=1,...,J\).
Each of the \(J\) level-2 unit has \(n_j\) individuals sampled from it, and the total sample size is \(N =\sum_{j=1}^J{n_j}\).
Predictors in multilevel models can be measured at level-1 or level-2.
For the empstress data
# empstress
head(empstress)
## company_id employee_id salary pct_open stress
## 1 1 1 76.594 11.082 48
## 2 1 2 80.259 11.082 41
## 3 1 3 67.790 11.082 43
## 4 1 4 82.514 11.082 47
## 5 1 5 76.245 11.082 54
## 6 1 6 78.169 11.082 41
For variables measured at level-1 like \(y_{ij}\), we can calculate 2 kinds of means:
\[\bar{y}_{..} = \frac{1}{N}\sum_{j=1}^J{\sum_{i=1}^{n_j}{y_{ij}}}\]
\[\bar{y}_{.j} = \frac{1}{n_j}\sum_{i=1}^{n_j}{y_{ij}}\]
Below we see a graph of stress scores (black points), its overall mean (line) and company means (larger red points).
# add cluster means of stress to data set
# mutate() with .by= makes adding cluster means to data easy
empstress <- empstress %>%
mutate(.by=company_id, stress_m=mean(stress))
# graph of stress, overall mean and cluster means
ggplot(empstress, aes(x=company_id, y=stress)) +
geom_point() +
geom_hline(yintercept = mean(empstress$stress), linetype=2) +
geom_point(aes(y=stress_m), color="red", size=3) +
scale_x_continuous(breaks=1:15, labels=1:15) +
labs(title="Stress scores (black) by company and company stress means (red)")
We can decompose a level-1 outcome \(y_{ij}\) into the sum of a cluster mean \(y_{.j}\) and the deviation from that cluster mean:
\[\begin{align} y_{ij} &= \bar{y}_{.j} + (y_{ij} - \bar{y}_{.j}) \\ y_{ij} &= \text{cluster mean + deviation from cluster mean} \end{align}\]
We can estimate the variation of each of the terms in the equation above:
The total variation of \(y_{ij}\) in a multilevel model can thus be decomposed as:
\[\text{Total Variation = Between-Cluster Variation + Within-Cluster Variation}\]
Importantly, both \(\text{Between-Cluster Variation}\) and \(\text{Within-Cluster Variation}\) can decomposed into explained and residual variation:
\[\text{Between-Cluster Variation = Explained Between-Cluster Variation + Residual Between-Cluster Variation}\]
\[\text{Within-Cluster Variation = Explained Within-Cluster Variation + Residual Within-Cluster Variation}\]
Describes variation of individuals within the same cluster (e.g. employees’ stress scores varying within a company).
Level-1 predictors explain within-cluster variation in the outcome (e.g. differences in employees’ salaries within a company explain differences in their stress scores).
Describes variation of cluster means about the overall mean (e.g. company stress score means varying around the overall mean).
ggplot(empstress, aes(x=company_id, y=stress_m)) +
geom_point(color="red", size=3) +
geom_hline(yintercept = mean(empstress$stress), linetype=2) +
scale_x_continuous(breaks=1:15, labels=1:15) +
labs(title="Between-company variation of stress means (red) about overall mean (dashed line)")
Level-2 predictors explain between-cluster variation in the outcome (e.g. differences in percent unfilled positions can explain differences in mean stress across companies)
ggplot(empstress, aes(x=pct_open, y=stress_m)) +
geom_point(color="red", size=3) +
labs(title="Level-2 predictor percent unfilled can explain between-company variation in mean stress", x="percent unfilled positions")
The random intercept model is the simplest multilevel model and decomposes the total variation into unexplained or residual within-cluster and between-cluster variation:
\[\begin{align}y_{ij} &= \beta_{0j} + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} + u_{0j} \end{align}\]
# random intercept model
m_ri <- lmer(stress ~ 1 + (1|company_id), data=empstress)
summary(m_ri)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ 1 + (1 | company_id)
## Data: empstress
##
## REML criterion at convergence: 1107.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2379 -0.6096 -0.0342 0.5618 3.3270
##
## Random effects:
## Groups Name Variance Std.Dev.
## company_id (Intercept) 116.08 10.774
## Residual 73.41 8.568
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 45.000 2.868 15.69
In the output above, we see estimates of the residual variance and random intercept variance, \(\hat{\sigma}_{\epsilon}^2=27.3\) and \(\hat{\tau}_{00}=64.36\), respectively.
The random intercept model is often used to calculate the intraclass correlation coefficient (ICC), which quantifies the proportion of total variation in \(y_{ij}\) due to between-cluster variation.
The ICC is calculated by dividing \(\hat{\tau}_{00}^2\), the between-cluster variance, by \(\hat{\tau}_{00}^2 + \hat{\sigma}_{\epsilon}^2\), which estimates the total variance.
\[ICC = \frac{\hat{\tau}_{00}^2}{\hat{\tau}_{00}^2 + \hat{\sigma}_{\epsilon}^2}\] The ICC can also be interpreted as the correlation of the outcome from 2 randomly sampled individuals from the same level-2 unit.
\(ICC>.5\) suggests that cluster means vary more than individual scores within cluster, or \(\hat{\tau}_{00}^2 > \hat{\sigma}_{\epsilon}^2\). Here, level-2 predictors will be important to explain variation in \(y_ij\).
\(ICC<.5\) suggests that cluster means vary less than individual scores within cluster, or \(\hat{\tau}_{00}^2 < \hat{\sigma}_{\epsilon}^2\). Here, level-1 predictors will be important to explain variation in \(y_{ij}\).
Like the outcome \(y_{ij}\), a predictor measured at level-1 can be decomposed into a cluster mean and a deviation from the cluster mean:
\[x_{ij} = (x_{ij} - \bar{x}_{.j}) + \bar{x}_{.j}\]
For convenience, we superscript with \(^w\) to signify the deviation from the cluster mean, as in \(x_{ij}^w\), which is often called group-mean-centered or cluster-mean-centered:
\[x_{ij}^w = (x_{ij} - \bar{x}_{.j})\]
and thus
\[x_{ij} = x_{ij}^w + \bar{x}_{.j}\]
As a predictor, \(x_{ij}^w\) can only predict level-1 variation in an outcome like \(y_{ij}\).
The cluster means for group-mean centered variables like \(x_{ij}^w\) are all 0, so they do not vary at level-2 (i.e., have group differences removed).
# to create group-mean centered salary, salary_c:
# 1. add cluster means of predictor salary (salary_m) to data
# 2. subtract salary_m from original salary to create salary_c
empstress <- empstress %>%
mutate(.by=company_id, # process data by company_id
salary_m = mean(salary)) %>%
mutate(salary_c = salary-salary_m)
# graph of salary_c and company means of salary_c (all zero)
ggplot(empstress, aes(x=company_id, y=salary_c)) +
geom_point() +
stat_summary(geom="point", fun="mean", color="red", size=3) +
scale_x_continuous(breaks=1:20, labels=1:20) +
labs(title="group-mean centering removes between-company differences")
The cluster means \(x_{.j}\) can themselves be added as level-2 predictors to explain level-2 variation in the outcome.
Recall that
\[x_{ij} = x_{ij}^w + \bar{x}_{.j}\]
In the model in which \(x_{ij}^w\) and \(\bar{x}_{.j}\) are predictors:
\[y_{ij} = \beta_{0j} + \beta_1x_{ij}^w + \beta_2\bar{x}_{.j} + \epsilon_{ij}\]
Importantly, the above model is genearlly not equivalent to the model in which the original \(x_{ij}\) is the predictor:
\[y_{ij} = \beta_{0j}^* + \beta_1^*x_{ij} + \epsilon_{ij}\]
Implying that:
\[\begin{align}\beta_1^* &\ne \beta_1 \\ \beta_1^* &\ne \beta_2\end{align}\]
Instead, the coefficient for the original \(x_{ij}\), \(\beta_1^*\), will be some uninterpretable combination of the within-cluster and between-cluster effects, \(\beta_1\) and \(\beta_2\), respectively.
Compare the coefficients of the model where group_centered
salary_c
and company means salary_m
are
entered as predictor to the model where the original salary
is entered:
# salary_c and salary_m coefficients have clear within-company and between-company interpretations
m_salary_c_m <- lmer(stress ~ salary_c + salary_m +
(1 | company_id),
data=empstress)
summary(m_salary_c_m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ salary_c + salary_m + (1 | company_id)
## Data: empstress
##
## REML criterion at convergence: 1072
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.14261 -0.58580 0.02166 0.46986 3.14261
##
## Random effects:
## Groups Name Variance Std.Dev.
## company_id (Intercept) 22.71 4.766
## Residual 64.73 8.046
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 125.7257 11.9571 10.515
## salary_c 0.7414 0.1696 4.371
## salary_m -1.0053 0.1479 -6.798
##
## Correlation of Fixed Effects:
## (Intr) slry_c
## salary_c 0.000
## salary_m -0.993 0.000
# salary coefficient is uninterpretable average of coefficients above
m_salary <- lmer(stress ~ salary +
(1 | company_id),
data=empstress)
summary(m_salary)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ salary + (1 | company_id)
## Data: empstress
##
## REML criterion at convergence: 1102.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.23736 -0.62578 0.00259 0.53223 3.10754
##
## Random effects:
## Groups Name Variance Std.Dev.
## company_id (Intercept) 239.80 15.485
## Residual 65.62 8.101
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 4.6013 13.3741 0.344
## salary 0.5031 0.1587 3.170
##
## Correlation of Fixed Effects:
## (Intr)
## salary -0.953
Multilevel models can include fixed and random effects.
The intercept and any regression coefficient for a level-1 predictor can be either
In either case, the fixed effect, e.g. \(\gamma_{10}\), is interpreted as the mean coefficient across clusters.
# fixed slope of salary_c to explain variation in stress
m_salary_c_fixed <- lmer(stress ~ salary_c +
(1 | company_id),
data=empstress)
summary(m_salary_c_fixed)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ salary_c + (1 | company_id)
## Data: empstress
##
## REML criterion at convergence: 1091.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.22676 -0.61828 -0.01386 0.49847 3.00557
##
## Random effects:
## Groups Name Variance Std.Dev.
## company_id (Intercept) 116.95 10.814
## Residual 64.73 8.046
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 45.0000 2.8685 15.688
## salary_c 0.7414 0.1696 4.371
##
## Correlation of Fixed Effects:
## (Intr)
## salary_c 0.000
The coefficient \(\beta_1 = \gamma_{10}
\approx 0.741\) is interpreted as the fixed slope of group-mean
centered salary, salary_c
, which suggests that employees
with higher salaries than other employees within the same company have
more stress (perhaps because they have more responsibilities).
Random effects in multilevel models are typically used to model heterogeneity (variation) around the fixed effect
In the model below with level-1 predictors \(x_{ij}^w\) and \(z_{ij}^w\):
\[\begin{align}y_{ij} &= \beta_{0j} + \beta_{1j}x_{ij}^w + \beta_2z_{ij}^w + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} + u_{0j} \\ \beta_{1j} &= \gamma_{10} + u_{1j} \\ \beta_2 &= \gamma_{20} \end{align}\]
Random effects like \(u_{0j}\) and \(u_{1j}\) are typically assumed to have a multivariate normal distribution, where the random effects are assumed to have zero means and some covariance matrix quantifying their variances and covariances:
\[\Big(\begin{matrix} u_{0j} \\ u_{1j} \end{matrix}\Big) \sim N\Bigg( \Big(\begin{matrix} 0 \\ 0 \end{matrix}\Big), \Big(\begin{matrix} \tau_{00}^2 & \tau_{01} \\ \tau_{01} & \tau_{11}^2 \end{matrix}\Big) \Bigg)\] * The elements of the covariance matrix \(\tau_{00}^2\), \(\tau_{11}^2\), \(\tau_{01}=\tau_{10}\) are parameters estimated in the multilevel model.
# fixed and random slope of salary_c to explain variation in stress
m_salary_c_random <- lmer(stress ~ salary_c +
(1 + salary_c | company_id),
data=empstress)
summary(m_salary_c_random)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ salary_c + (1 + salary_c | company_id)
## Data: empstress
##
## REML criterion at convergence: 1087.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.31751 -0.58010 -0.05927 0.47469 2.48187
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## company_id (Intercept) 117.4425 10.837
## salary_c 0.3329 0.577 0.19
## Residual 59.3291 7.703
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 45.0000 2.8679 15.691
## salary_c 0.8178 0.2245 3.643
##
## Correlation of Fixed Effects:
## (Intr)
## salary_c 0.126
salary_c
across companies.# use coef() to see each cluster's intercepts and slopes
coef(m_salary_c_random)
## $company_id
## (Intercept) salary_c
## 1 46.17442 0.48449509
## 2 27.79865 0.79443093
## 3 41.50957 0.95495935
## 4 31.01874 0.73785965
## 5 53.72239 0.71527322
## 6 51.56692 0.87882989
## 7 44.06388 0.89459952
## 8 52.49329 1.26991889
## 9 62.81527 1.60314719
## 10 39.82638 0.58183403
## 11 39.10364 0.78680564
## 12 27.18478 0.55404316
## 13 55.85613 -0.08883465
## 14 50.70981 0.86780080
## 15 51.15612 1.23142129
##
## attr(,"class")
## [1] "coef.mer"
Random slopes may provide a better fit of level-1 predictors to the data than the fixed slope.
## $title
## [1] "Random slopes (red) vary around fixed mean slope (dashed black)\nDisplaying first 6 companies"
##
## attr(,"class")
## [1] "labels"
Random effects represent residual (unexplained) variation in model coefficients across clusters, and thus their variance estimates (e.g. \(\tau_{00}\)) quantify residual variation.
\[\begin{align}y_{ij} &= \beta_{0j} + \beta_{1j}x_{ij}^w + \beta_2z_{ij}^w + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} + u_{0j} \\ \beta_{1j} &= \gamma_{10} + u_{1j} \end{align}\]
In the model above, the random effects \(u_{0j}\) and \(u_{1j}\) represent unexplained heterogeneity in the intercepts and slopes, respectively.
Adding level-2 predictors (\(A_j\) below) to the model can explain the heterogeneity in the intercepts, and adding interactions of level-2 predictors with level-1 predictors can be used to explain heterogeneity in the slope coefficients.
\[\begin{align}y_{ij} &= \beta_{0j} + \beta_{1j}x_{ij}^w + \beta_2z_{ij}^w + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} + \gamma_{01}A_j + u_{0j} \\ \beta_{1j} &= \gamma_{10} + \gamma_{11}A_j + u_{1j} \end{align}\]
If level-2 predictor \(A_j\) is predicitve, the random effects will be smaller (representing less unexplained heterogeneity) resulting in smaller variance estimates, \(\hat{\tau}_{00}\) and \(\hat{\tau}_{11}\).
Run random intercept model
Generate cluster means and group-centered predictor
Run model with just within predictor, take note of variances and interpret coefficient
Add cluster means and do same
Rights and Sterba (2019) decompose the model-implied total variance of the outcome as follows:
\[\widehat{var}(y_{ij}) = f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2\] Explained variance components
Residual variance components
This complete variance decomposition allowed Rights and Sterba (2019) to develop a general framework for defining multilevel \(R^2\) measures that are clearly interpretable and that subsumes most multilevel \(R^2\) measures previously defined in the literature.
†In this decomposition, all level-1 predictors are assumed to be cluster-mean-centered
†† Random intercept variation can be considered residual, unexplained between-cluster variation but it can also be considered variation explained by cluster IDs.
Within this framework, we can also fully decompose the model-implied within-cluster variance decomposition:
\[\widehat{var}(y_{ij}-\bar{y}_{.j}) = f_1 + v + \hat{\sigma}_\epsilon^2\]
Explained variance components
Residual variance component
Below we see graphs that show how the fixed slope and random slope explain increasingly more within-company variation, reflected by the decreasing residuals.
Notice how the residual variances decrease of the 3 models depicted above decreases as we add salary_c fixed slope and random slope to the model (while random intercept variance does not decrease).
# VarCorr() returns random effect and residual standard deviations
# wrapping VarCorr() in print() with comp="Variance" returns variances
# random intercept model
print(VarCorr(m_ri), comp="Variance")
## Groups Name Variance
## company_id (Intercept) 116.083
## Residual 73.412
# fixed slope of salary_c
print(VarCorr(m_salary_c_fixed), comp="Variance")
## Groups Name Variance
## company_id (Intercept) 116.951
## Residual 64.732
# fixed and random slope of salary_c
print(VarCorr(m_salary_c_random), comp="Variance")
## Groups Name Variance Cov
## company_id (Intercept) 117.4425
## salary_c 0.3329 1.216
## Residual 59.3291
\[\widehat{var}(\bar{y}_{.j}-\bar{y}_{..}) = f_2 + m \] Explained variance component
Residual variance component
To demonstrate, we will regress stress
on level-2
pct_open
:
# level-2 pct_open explains between-company variation in mean stress
m_pct_open <- lmer(stress ~ pct_open +
(1 | company_id),
data=empstress)
summary(m_pct_open)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ pct_open + (1 | company_id)
## Data: empstress
##
## REML criterion at convergence: 1103.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2748 -0.6105 -0.0242 0.5497 3.2955
##
## Random effects:
## Groups Name Variance Std.Dev.
## company_id (Intercept) 105.77 10.284
## Residual 73.41 8.568
## Number of obs: 150, groups: company_id, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 30.3667 10.0800 3.013
## pct_open 1.4508 0.9616 1.509
##
## Correlation of Fixed Effects:
## (Intr)
## pct_open -0.962
Compare the random intercept variances (and residual variances) for
the model with just random intercepts to the model where we add
pct_open
as a predictor.
# random intercept model
print(VarCorr(m_ri), comp="Variance")
## Groups Name Variance
## company_id (Intercept) 116.083
## Residual 73.412
# level-2 pct_open as predictor
print(VarCorr(m_pct_open), comp="Variance")
## Groups Name Variance
## company_id (Intercept) 105.770
## Residual 73.412
The residuals below represent unexplained between-company variation
in mean stress (i.e., random intercept variation), and we see how
pct_open
explains away some of this unexplained
variation.
Rights and Sterba (2019) developed a set of \(R^2\) measures, which can include different terms in the numerator and denominator, so notation is required to distinguish among them:
\(R^{2(f_1)}_w\) is thus the proportion of within-cluster variance explained by level-1 fixed effects.
Proportion of total variance explained by level-1 (within-cluster) fixed effects
\[R^{2(f_1)}_t = \frac{f_1}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\]
Proportion of total variance explained by level-2 (between-cluster) fixed effects
\[R^{2(f_2)}_t = \frac{f_2}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\]
Proportion of total variance explained by fixed effects
\[\begin{align}R^{2(f)}_t &= R^{2(f_1)}_t + R^{2(f_2)}_t \\ &= \frac{f_1+f_2}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\end{align}\]
Superscript \(f\) is used to represent level-1 and level-2 fixed effects together.
Proportion of total variance explained by random slope variation
\[R^{2(v)}_t = \frac{v}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\]
Proportion of total variance explained by random intercept variation
\[R^{2(m)}_t = \frac{m}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\]
Proportion of total variance explained by predictors (fixed effects and random slopes)
\[\begin{align}R^{2(fv)}_t &= R^{2(f_1)}_t + R^{2(f_2)}_t + R^{2(v)}_t \\ &= \frac{f_1+f_2+v}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\end{align}\]
Proportion of total variance explained by model (all fixed and random effects)
\[\begin{align}R^{2(fvm)}_t &= R^{2(f_1)}_t + R^{2(f_2)}_t + R^{2(v)}_t + R^{2(m)}_t \\ &= \frac{f_1+f_2+v+m}{f_1 + f_2 + v + m + \hat{\sigma}_\epsilon^2}\end{align}\]
Proportion of within-cluster variance explained by level-1 fixed effects
\[R^{2(f_1)}_w = \frac{f_1}{f_1 + v + \hat{\sigma}_\epsilon^2}\]
Proportion of within-cluster variance explained by random slopes
\[R^{2(v)}_w = \frac{v}{f_1 + v + \hat{\sigma}_\epsilon^2}\]
Proportion of within-cluster variance explained by level-1 predictors (fixed effects and random slopes)
\[\begin{align}R^{2(f_1v)}_w &= R^{2(f_1)}_w + R^{2(v)}_w \\ &= \frac{f_1 + v}{f_1 + v + \hat{\sigma}_\epsilon^2}\end{align}\]
Proportion of between-cluster variance explained by level-2 fixed effects
\[R^{2(f_2)}_b = \frac{f_2}{f_2 + m}\]
Proportion of between-cluster variance explained by random intercept (cluster mean) variation
\[R^{2(m)}_b = \frac{m}{f_2 + m}\]
We do not estimate \(R^{2(f_2m)}_b\) because it always equals 1.
r2mlm
packager2mlm
packageThe r2mlm
package was developed by Rights and Sterba to
calculate these new \(R^2\) measures.
In this section we learn to use the following functions in the
r2mlm
package:
r2mlm()
: calculates and visualizes all possible \(R^2\) measures for a modelr2mlm_ci()
: calculates bootstrapped confidence
intervals for \(R^2\) measuresr2mlm_comp()
compares 2 models and calculates
differences in all \(R^2\)
measuresThe r2mlm
package can calculate \(R^2\) measures for models run using
lmer()
from the lme4
package†, so
we load both packages now.
# load packages
library(lme4)
library(r2mlm)
†or using nlme()
from the accompanying
package nlme
Note: Much of the material in this section is
adapted from Shaw et al. (2023), an article discussing the features and
usage of the r2mlm
package
To demonstrate r2mlm
usage, we will use a dataset that
loads with the package called teachsat
, a
simulated dataset of variables related to teacher job
satisfaction.
Sample is 300 schools (level-2), each with 30 teachers (level-1), for a total \(N=9000\).
Variables
schoolID
teacherID
satisfaction
: teacher job satisfaction (outcome), 1-10
scalecontrol_c
: school-mean-centered teacher’s rating of
control over curriculumsalary_c
: school-mean-centered teacher’s salary (in
thousands of dollars)control_m
: school mean rating of teacher’s rating of
control over curriculumsalary_m
: school mean teacher’s salarys_t_ratio
: student-to-teacher ratio for the school# teachsat multilevel data
head(teachsat)
## schoolID teacherID satisfaction control_c salary_c control_m salary_m
## 1 1 1 8.582175 2.7239726 7.274769 5.474631 77.69538
## 2 1 2 7.428820 -0.6604034 13.321313 5.474631 77.69538
## 3 1 3 7.990401 -1.3002809 20.131688 5.474631 77.69538
## 4 1 4 5.100618 -0.6278236 -9.810052 5.474631 77.69538
## 5 1 5 7.256876 1.5190521 1.205659 5.474631 77.69538
## 6 1 6 5.154663 -1.9307584 1.778745 5.474631 77.69538
## s_t_ratio
## 1 30
## 2 30
## 3 30
## 4 30
## 5 30
## 6 30
We often start with the random intercept model to quantify the amount of variation within and between clusters.
\[\begin{align} y_{ij} &= \beta_{0j} + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} + u_{0j} \end{align}\]
# random intercept model
m_m <- lmer(satisfaction ~ 1 + (1|schoolID),
data=teachsat)
summary(m_m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: satisfaction ~ 1 + (1 | schoolID)
## Data: teachsat
##
## REML criterion at convergence: 29426.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8004 -0.6500 -0.0083 0.6370 3.4825
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 0.8376 0.9152
## Residual 1.3955 1.1813
## Number of obs: 9000, groups: schoolID, 300
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.99852 0.05429 110.5
The ICC is calculated by dividing the random intercept variance by the sum of the random intercept variance and residual variance:
#ICC
.8376/(.8376+1.3955)
## [1] 0.375084
About 37.5% of the total variation in satisfaction
is
between-cluster (differences in mean satisfaction across schools), while
the remaining 62.5% is within-cluster (differences in satisfaction
between teachers in the same school).
To calculate all relevant \(R^2\)
using r2mlm()
, simply supply the lmer
object
as an argument:
# R^2 for random intercept model
r2mlm(m_m)
## $Decompositions
## total within between
## fixed, within 0.0000000 0 NA
## fixed, between 0.0000000 NA 0
## slope variation 0.0000000 0 NA
## mean variation 0.3750827 NA 1
## sigma2 0.6249173 1 NA
##
## $R2s
## total within between
## f1 0.0000000 0 NA
## f2 0.0000000 NA 0
## v 0.0000000 0 NA
## m 0.3750827 NA 1
## f 0.0000000 NA NA
## fv 0.0000000 0 NA
## fvm 0.3750827 NA NA
The output is divided into 3 sections:
Graphs
satisfaction
$Decompositions
Shows the proportion of variance explained by \(f_1\), \(f_2\), \(v\), \(m\), and \(\sigma_{\epsilon}^2\), respectively, for each of total, within-school, and between-school variance.
mean variation
(random intercept variation) accounts
for 37.5% of the total variancesigma2
(residual variation) accounts for the remaining
62.5% of total varianceNA
means this source (row) cannot explain the variation
captured by the column (e.g. \(f2\)
cannot explain within-school variance)$R2s
Displays all \(R^2\) measures calculable for this model:
Now we add fixed effect of level-1 predictor salary_c
and a level-2 predictor salary_m
to the random intercept
model:
\[\begin{align} y_{ij} &= \beta_{0j} + \beta_1salary\_c + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} +\gamma_{01}salary\_m + u_{0j} \\ \beta_1 &= \gamma_{10} \end{align}\]
# fixed effects and random intercept model
m_f.m <- lmer(satisfaction ~ salary_c + salary_m + (1 | schoolID),
data=teachsat)
summary(m_f.m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: satisfaction ~ salary_c + salary_m + (1 | schoolID)
## Data: teachsat
##
## REML criterion at convergence: 26428.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4422 -0.6710 -0.0137 0.6522 3.7630
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolID (Intercept) 0.5984 0.7736
## Residual 0.9980 0.9990
## Number of obs: 9000, groups: schoolID, 300
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.104077 0.267652 11.60
## salary_c 0.071763 0.001219 58.88
## salary_m 0.039240 0.003575 10.98
##
## Correlation of Fixed Effects:
## (Intr) slry_c
## salary_c 0.000
## salary_m -0.985 0.000
The coefficient for salary_c
suggests that teachers that
have higher salaries than other teachers within the same school are more
satisfied.
The coefficient for salary_m
suggests that schools with
higher mean teacher salaries have higher mean satisfaction.
Both the random intercept variance and residual variance have been reduced from the random intercept model (.5984 and .9980, down from .8376 and 1.3955, respectively), suggesting a significant proportion of each of the variances have been explained by the fixed effects.
Let’s see the \(R^2\) measures for this model:
# R^2 for fixed effects and random intercept model
r2mlm(m_f.m)
## $Decompositions
## total within between
## fixed, within 0.1720308 0.2780736 NA
## fixed, between 0.1135477 NA 0.2977533
## slope variation 0.0000000 0.0000000 NA
## mean variation 0.2678004 NA 0.7022467
## sigma2 0.4466212 0.7219264 NA
##
## $R2s
## total within between
## f1 0.1720308 0.2780736 NA
## f2 0.1135477 NA 0.2977533
## v 0.0000000 0.0000000 NA
## m 0.2678004 NA 0.7022467
## f 0.2855784 NA NA
## fv 0.2855784 0.2780736 NA
## fvm 0.5533788 NA NA
Graphs
salary_c
(level-1) explains a
little less than 30% of the within-school variancesalary_m
(level-2) explains
approximately 30% of the between-school variance$Decompositions
mean variation
and sigma2
, level-2 and
level-1 residual variances, respectively, make up less of the total
variance compared to the random intercept model because the fixed
effects are explaining some of the unexplained variance away$R2s
\[R^{2(f)}_t = R^{2(f_1)}_t + R^{2(f_2)}_t\]
and
\[R^{2(fvm)}_t = R^{2(f_1)}_t + R^{2(f_2)}_t + R^{2(v)}_t + R^{2(m)}_t\]
The final source of explained variation to examine with
r2mlm()
is \(v\), random
slopes.
We allow the effect of salary_c
to vary by school via
random slopes to attempt to explain more within-school variation in
teacher satisfaction.
\[\begin{align} y_{ij} &= \beta_{0j} + \beta_{1j}salary\_c + \epsilon_{ij} \\ \beta_{0j} &= \gamma_{00} +\gamma_{01}salary\_m + u_{0j} \\ \beta_{1j} &= \gamma_{10} + u_{1j} \end{align}\]
# random slope, fixed effects and random intercept model
m_v.f.m <- lmer(satisfaction ~ salary_c + salary_m + (1 + salary_c | schoolID),
data=teachsat)
summary(m_v.f.m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: satisfaction ~ salary_c + salary_m + (1 + salary_c | schoolID)
## Data: teachsat
##
## REML criterion at convergence: 25376.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3085 -0.6585 -0.0081 0.6432 3.6531
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolID (Intercept) 0.604140 0.77726
## salary_c 0.002291 0.04786 0.04
## Residual 0.826908 0.90934
## Number of obs: 9000, groups: schoolID, 300
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.086462 0.267511 11.54
## salary_c 0.071689 0.002987 24.00
## salary_m 0.039479 0.003573 11.05
##
## Correlation of Fixed Effects:
## (Intr) slry_c
## salary_c 0.006
## salary_m -0.985 0.000
Although the salary_c
slope variance appears small at
0.0023, the standard deviation is 0.0479 (\(\sqrt{0.0023} \approx 0.0479\)), more than
50% the size of the coefficient itself (0.0717).
The residual variance has again decreased compared to the model with just fixed effects, suggesting that the random slope variation is explaining some of the total and within-school variance.
Now let’s see all possible \(R^2\) measures calculated for this model:
# model has all possible sources of explained variance
r2mlm(m_v.f.m)
## $Decompositions
## total within between
## fixed, within 0.17107144 0.2776837 NA
## fixed, between 0.11452685 NA 0.2982981
## slope variation 0.07624695 0.1237643 NA
## mean variation 0.26940742 NA 0.7017019
## sigma2 0.36874735 0.5985520 NA
##
## $R2s
## total within between
## f1 0.17107144 0.2776837 NA
## f2 0.11452685 NA 0.2982981
## v 0.07624695 0.1237643 NA
## m 0.26940742 NA 0.7017019
## f 0.28559828 NA NA
## fv 0.36184523 0.4014480 NA
## fvm 0.63125265 NA NA
Graphs
salary_c
slope variation (red stripes) explains a
little less than 10% of the total variancesalary_c
slope variation explains a bit more than 10%
of the within-cluster variance, which is now about 40% explained by the
model$Decompositions
salary_c
slope variation is the smallest source of
explained variance$R2s
salary_c
explains about 40.1% of the within-school variance
(via its fixed and random slope)The function r2mlm_ci()
calculates bootstrapped
confidence intervals for each \(R^2\)†.
Using r2mlm_ci()
requires the bootmlm
package, which resamples multilevel data using the bootstrap.
The bootmlm
package is still in development, so is not
yet available for download from CRAN, but can be downloaded from github
using the following code:
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("marklhc/bootmlm")
The following arguments are required when calling
r2mlm_ci()
model
: an lmer()
or nlme()
objectnsim
: number of bootstrap samples (e.g. 500 or
1000)boottype
: "parametric"
, which assumes
normally distributed residuals, or "residual"
, which does
notconfinttype
: type of confidence interval, one of
"norm"
(normal), "basic"
, or
"perc"
(percentile)Below, we request 95% percentile confidence intervals using 1000 residual bootstrap samples:
# set random seed to reproduce results
set.seed(1932)
# this takes a long time to run!
r2mlm_ci(m_v.f.m, nsim=1000, boottype="residual", confinttype = "perc",
progress=F) # suppresses a progress bar from appearing in output
## 2.5 % 97.5 %
## f1_total 0.14921313 0.19447585
## f2_total 0.08144853 0.15445686
## v_total 0.06288597 0.09266709
## m_total 0.23435219 0.30509080
## f_total 0.25205535 0.32365693
## fv_total 0.32817014 0.40108991
## fvm_total 0.60408394 0.65694501
## f1_within 0.24738847 0.30914729
## v_within 0.10441981 0.14750445
## fv_within 0.37114323 0.43443146
## f2_between 0.22192875 0.38347671
## m_between 0.61652329 0.77807125
† As of writing this workshop, the r2mlm_ci()
function appears to be still somewhat in development.
Although r2mlm()
combines individual sources of
explained variance of the same type (e.g. all level-1 fixed effects), we
may be interested in the unique contribution of a single variable or
smaller subset of variables.
Or, we may wish to compare \(R^2\) measures between multilevel models with completely different sets of predictors.
For both purposes we wish to calculate the difference in all \(R^2\) measures betwen 2 models, which
r2mlm_comp()
can do with 2 lmer()
objects
specified as arguments.
Example: calculating unique contributions to \(R^2\)
Assume we are interested in calculating how much various \(R^2\) measures increase when we add the
following to our previous model with fixed and random slopes of
salary_c
, fixed slope of salary_m
, and random
intercepts:
control_c
fixed and random slopecontrol_m
fixed slopeFirst, we fit the model with all of these components:
# model with additional predictors
m_all <- lmer(satisfaction ~ salary_c + salary_m + control_c + control_m +
(1 + salary_c + control_c | schoolID),
data=teachsat)
summary(m_all)
## Linear mixed model fit by REML ['lmerMod']
## Formula: satisfaction ~ salary_c + salary_m + control_c + control_m +
## (1 + salary_c + control_c | schoolID)
## Data: teachsat
##
## REML criterion at convergence: 22463.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3888 -0.6513 -0.0009 0.6350 3.9643
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolID (Intercept) 0.609125 0.7805
## salary_c 0.002362 0.0486 0.07
## control_c 0.028816 0.1698 0.04 -0.13
## Residual 0.553253 0.7438
## Number of obs: 9000, groups: schoolID, 300
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.692568 0.345176 7.801
## salary_c 0.072820 0.002960 24.604
## salary_m 0.040016 0.003557 11.251
## control_c 0.278388 0.011053 25.186
## control_m 0.072244 0.042491 1.700
##
## Correlation of Fixed Effects:
## (Intr) slry_c slry_m cntrl_c
## salary_c 0.008
## salary_m -0.787 0.000
## control_c 0.004 -0.107 0.000
## control_m -0.637 0.000 0.044 0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00825855 (tol = 0.002, component 1)
The large t value
value for control_c
suggests that the data provide strong evidence that it explains a
significant portion of within-school variance, but the small
t value
for control_m
suggests that the data
do not provide strong evidence that it explains much variance (even
though its coefficient is larger than that for
salary_m
).
Nest we use r2mlm_comp
to calculate the difference in
\(R^2\) measures between the two
models:
r2mlm_comp(m_v.f.m, m_all)
## $`Model A R2s`
## total within between
## f1 0.17107144 0.2776837 NA
## f2 0.11452685 NA 0.2982981
## v 0.07624695 0.1237643 NA
## m 0.26940742 NA 0.7017019
## f 0.28559828 NA NA
## fv 0.36184523 0.4014480 NA
## fvm 0.63125265 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.2567172 0.4194249 NA
## f2 0.1180306 NA 0.3042572
## v 0.1102090 0.1800596 NA
## m 0.2698998 NA 0.6957428
## f 0.3747479 NA NA
## fv 0.4849568 0.5994845 NA
## fvm 0.7548567 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 0.0856457841 0.14174120 NA
## f2 0.0035037875 NA 0.005959122
## v 0.0339620354 0.05629528 NA
## m 0.0004924254 NA -0.005959122
## f 0.0891495715 NA NA
## fv 0.1231116070 0.19803648 NA
## fvm 0.1236040324 NA NA
The output contains:
control_c
appears to have increased the contributions
of both fixed slopes and slope variation to within-school variancecontrol_m
appears to very slightly incresae the
contribution of fixed slopes to between-school variancesalary_c
explains about 8.6% more of
the total variance (than the reduced model), and about 14.2% more of the
within-school variancesalary_c
explains about 3.4% more
of the total variance and about 5.6% more of the within-school
variancesalary_m
explains about 0.35% more
of total variance, and about 0.6% of the between-school variancesalary_c
and salary_m
explain
about 12.4% more of the total varianceExample: comparing different sets of predictors
Suppose we are interested in whether salary or curriculum control explain more of the variation in teacher satisfaction.
To assess this, we will compare 2 models:
salary_c
with fixed and random slopes,
salary_m
with fixed slopecontrol_c
with fixed and random slopes,
control_m
with fixed slopeIf you compare 2 non-nested models with r2mlm_comp()
,
you must specify the dataset as well.
# salary model
m_salary <- lmer(satisfaction ~ salary_c + salary_m +
(1 + salary_c | schoolID),
data=teachsat)
# curriculum control model
m_control <- lmer(satisfaction ~ control_c + control_m +
(1 + control_c | schoolID),
data=teachsat)
# compare models
r2mlm_comp(m_salary, m_control, data=teachsat)
## $`Model A R2s`
## total within between
## f1 0.17107144 0.2776837 NA
## f2 0.11452685 NA 0.2982981
## v 0.07624695 0.1237643 NA
## m 0.26940742 NA 0.7017019
## f 0.28559828 NA NA
## fv 0.36184523 0.4014480 NA
## fvm 0.63125265 NA NA
##
## $`Model B R2s`
## total within between
## f1 0.07973695 0.12847381 NA
## f2 0.00128136 NA 0.003377757
## v 0.03112197 0.05014435 NA
## m 0.37807113 NA 0.996622243
## f 0.08101831 NA NA
## fv 0.11214028 0.17861816 NA
## fvm 0.49021141 NA NA
##
## $`R2 differences, Model B - Model A`
## total within between
## f1 -0.09133449 -0.14920993 NA
## f2 -0.11324548 NA -0.2949203
## v -0.04512498 -0.07361995 NA
## m 0.10866371 NA 0.2949203
## f -0.20457997 NA NA
## fv -0.24970495 -0.22282988 NA
## fvm -0.14104124 NA NA
salary_c
explains more within-school variation than
control_c
salary_m
explains more between-school variation than
salary_m
Despite general recommendations to group-mean center level-1
predictors, r2mlm()
can be used with uncentered level-1
predictors.
Uncentered level-1 predictors potentially explain variation at both levels, thus:
\[\widehat{var}(y_{ij})_{x_{uncentered}} = f + v + m + \hat{\sigma}_\epsilon^2\] * Only \(R^{2(f)}_t\), \(R^{2(v)}_t\), \(R^{2(m)}_t\), \(R^{2(fv)}_t\), and \(R^{2(fvm)}_t\) will be reported (see Table 5 of Rights and Sterba(2019) for definitions)
Below, we sum control_c
and control_m
to
create uncentered control
, which we then model as a
predictor:
# create uncentered level-1 control
teachsat$control <- teachsat$control_c + teachsat$control_m
# fixed and random slopes for control
m_control_un <- lmer(satisfaction ~ control +
(1 + control | schoolID),
data=teachsat)
summary(m_control_un)
## Linear mixed model fit by REML ['lmerMod']
## Formula: satisfaction ~ control + (1 + control | schoolID)
## Data: teachsat
##
## REML criterion at convergence: 28017.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2656 -0.6153 -0.0006 0.6128 3.9539
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolID (Intercept) 1.50889 1.2284
## control 0.02637 0.1624 -0.63
## Residual 1.14230 1.0688
## Number of obs: 9000, groups: schoolID, 300
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 4.69931 0.08008 58.68
## control 0.26333 0.01178 22.34
##
## Correlation of Fixed Effects:
## (Intr)
## control -0.705
We see that we only get decompositions and \(R^2\) measures related to total variance for a model with uncentered level-1 predictors:
# r2mlm with uncentered level-1 predictor
r2mlm(m_control_un)
## $Decompositions
## total
## fixed 0.10497843
## slope variation 0.03992823
## mean variation 0.38130827
## sigma2 0.47378506
##
## $R2s
## total
## f 0.10497843
## v 0.03992823
## m 0.38130827
## fv 0.14490667
## fvm 0.52621494
satisfaction
is
explained by the fixed slope of control
control
control
, the random intercept variation explains about
38.1% of the total variancesatisfaction
Rights and Sterba (2019) explicitly caution against reporting \(R^2\) measures that combine several sources (e.g. \(R^{2(fvm)}_t\)) without reporting the \(R^2\) measures that capture a single source.
They provide an example where \(R^{2(fvm)}_t=0.8\) is reported alone, and thus it is unclear how much each of, \(f_1\), \(f_2\), \(v\) and \(m\) contribute.
The following figure illustrates the issue, where a model includes the fixed and random slope of a level-1 predictor and random intercepts:
Although \(R^{2(f,v,m)}_t=0.8\) for all models, the interpretations are very different:
The interpretation of \(R^{2(f,v,m)}_t=0.8\) could be thus clarified by also reporting \(R^{2(f)}_t\), \(R^{2(v)}_t\), and \(R^{2(m)}_t\).
Use r2mlm()
to calculate \(R^2\) for model with models run in Exercise
1.
Add additional predictors and do same
Original theoretical article introducing new \(R^2\) measures
Rights, J. D., & Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological methods, 24(3), 309.
Article that reviews multilevel models and discusses usage of
r2mlm
in detail
Shaw, M., Rights, J. D., Sterba, S. S., & Flake, J. K. (2023). r2mlm: An R package calculating R-squared measures for multilevel models. Behavior Research Methods, 55(4), 1942-1964.
Extension: \(R^2\) for multilevel models with more than 2 levels
Rights, J. D., & Sterba, S. K. (2023). R-squared measures for multilevel models with three or more levels. Multivariate behavioral research, 58(2), 340-367.