Background
Generalized linear mixed models (or GLMMs) are an extension of linear mixed models to allow response variables from different distributions, such as binary responses. Alternatively, you could think of GLMMs as an extension of generalized linear models (e.g., logistic regression) to include both fixed and random effects (hence mixed models). The general form of the model (in matrix notation) is:
Where
To make this more concrete, let’s consider an example from a
simulated dataset. Doctors (
In our example,
Because
In order to see the structure in more detail, we could also zoom in on just the first 10 doctors. The filled space indicates rows of observations belonging to the doctor in that column, whereas the white space indicates not belonging to the doctor in that column.
If we estimated it,
Which is read: “
Because
In other words,
The final element in our model is the variance-covariance matrix of the
residuals,
where
So the final fixed elements are
We could also frame our model in a two level-style equation for
the
Substituting in the level 2 equations into level 1, yields the mixed model specification. Here we grouped the fixed and random intercept parameters together to show that combined they give the estimated intercept for a particular doctor.
Generalized Linear Mixed Models
Up to this point everything we have said applies equally to linear mixed models as to generalized linear mixed models. Now let’s focus in on what makes GLMMs unique.
What is different between LMMs and GLMMs is that the response
variables can come from different distributions besides gaussian. In
addition, rather than modeling the responses directly,
some link function is often applied, such as a log link. We
will talk more about this in a minute. Let the linear predictor,
The generic link function is called
So our model for the conditional expectation of
We could also model the expectation of
With
Link Functions and Families
So what are the different link functions and families? There are many options, but we are going to focus on three, link functions and families for binary outcomes, count outcomes, and then tie it back in to continuous (normally distributed) outcomes.
For a binary outcome, we use a logistic link function and the probability density function, or PDF, for the logistic. These are:
Count Outcomes
For a count outcome, we use a log link function and the probability mass function, or PMF, for the poisson. Note that we call this a probability mass function rather than probability density function because the support is discrete (i.e., for positive integers). These are:
Continuous Outcomes
For a continuous outcome where we assume a normal distribution, the most common link function is simply the identity. In this case, there are some special properties that simplify things:
So you can see how when the link function is the identity, it essentially drops out and we are back to our usual specification of means and variances for the normal distribution, which is the model used for typical linear mixed models. Thus generalized linear mixed models can easily accommodate the specific case of linear mixed models, but generalize further.
Interpretation
The interpretation of GLMMs is similar to GLMs; however, there is an added complexity because of the random effects. On the linearized metric (after taking the link function), interpretation continues as usual. However, it is often easier to back transform the results to the original metric. For example, in a random effects logistic model, one might want to talk about the probability of an event given some specific values of the predictors. Likewise in a poisson (count) model, one might want to talk about the expected count rather than the expected log count. These transformations complicate matters because they are nonlinear and so even random intercepts no longer play a strictly additive role and instead can have a multiplicative effect. This section discusses this concept in more detail and shows how one could interpret the model results.
Suppose we estimated a mixed effects logistic model, predicting remission (yes = 1, no = 0) from Age, Married (yes = 1, no = 0), and IL6 (continuous). We allow the intercept to vary randomly by each doctor. We might make a summary table like this for the results.
Parameter | Est. | SE | p-value | OR |
---|---|---|---|---|
Intercept | 1.467 | .274 | <.001 | 4.335 |
Age | -.056 | .005 | <.001 | .946 |
Married (yes v no) | .26 | .064 | <.001 | 1.297 |
IL6 | -.053 | .011 | <.001 | .948 |
3.979 | ||||
Npatients = 8,525 | Ndoctors = 407 |
The estimates can be interpreted essentially as always. For example, for IL6, a one unit increase in IL6 is associated with a .053 unit decrease in the expected log odds of remission. Similarly, people who are married or living as married are expected to have .26 higher log odds of being in remission than people who are single.
Many people prefer to interpret odds ratios. However, these take on a more nuanced meaning when there are mixed effects. In regular logistic regression, the odds ratios the expected odds ratio holding all the other predictors fixed. This makes sense as we are often interested in statistically adjusting for other effects, such as age, to get the “pure” effect of being married or whatever the primary predictor of interest is. The same is true with mixed effects logistic models, with the addition that holding everything else fixed includes holding the random effect fixed. that is, the odds ratio here is the conditional odds ratio for someone holding age and IL6 constant as well as for someone with either the same doctor, or doctors with identical random effects. Although this can make sense, when there is large variability between doctors, the relative impact of the fixed effects (such as marital status) may be small. In this case, it is useful to examine the effects at various levels of the random effects or to get the average fixed effects marginalizing the random effects.
Generally speaking, software packages do not include facilities for getting estimated values marginalizing the random effects so it requires some work by hand. Taking our same example, let’s look at the distribution of probabilities at different values of the random effects. To do this, we will calculate the predicted probability for every patient in our sample holding the random doctor effect at 0, and then at some other values to see how the distribution of probabilities of being in remission in our sample might vary if they all had the same doctor, but which doctor varied.
So for all four graphs, we plot a histogram of the estimated probability of being in remission on the x-axis, and the number of cases in our sample in a given bin. The random effects, however, are varied being held at the values shown, which are the 20th, 40th, 60th, and 80th percentiles. The x axis is fixed to go from 0 to 1 in all cases so that we can easily compare.
What you can see is that although the distribution is the same across all levels of the random effects (because we hold the random effects constant within a particular histogram), the position of the distribution varies tremendously. Thus simply ignoring the random effects and focusing on the fixed effects would paint a rather biased picture of the reality. Incorporating them, it seems that although there will definitely be within doctor variability due to the fixed effects (patient characteristics), there is more variability due to the doctor. Not incorporating random effects, we might conclude that in order to maximize remission, we should focus on diagnosing and treating people earlier (younger age), good relationships (marital status), and low levels of circulating pro-inflammatory cytokines (IL6). Including the random effects, we might conclude that we should focus on training doctors.
Finally, let’s look incorporate fixed and random effects for each individual and look at the distribution of predicted probabilities of remission in our sample. that is, now both fixed and random effects can vary for every person.
Count Outcomes
We could fit a similar model for a count outcome, number of tumors. Counts are often modeled as coming from a poisson distribution, with the canonical link being the log. We will do that here and use the same predictors as in the mixed effects logistic, predicting count from from Age, Married (yes = 1, no = 0), and IL6 (continuous). We allow the intercept to vary randomly by each doctor. We might make a summary table like this for the results.
Parameter | Est. | SE | p-value | Exp(Est.) |
---|---|---|---|---|
Intercept | -.233 | .057 | <.001 | .792 |
Age | .026 | .001 | <.001 | 1.026 |
Married (yes v no) | -.13 | .013 | <.001 | .878 |
IL6 | .005 | .002 | .025 | 1.005 |
.169 | ||||
Npatients = 8,525 | Ndoctors = 407 |
The interpretations again follow those for a regular poisson model, for a one unit increase in Age, the expected log count of tumors increases .026. People who are married are expected to have .13 lower log counts of tumors than people who are single. Finally, for a one unit increase in IL6, the expected log count of tumors increases .005.
It can be more useful to talk about expected counts rather than expected log counts. However, we get the same interpretational complication as with the logistic model. The expected counts are conditional on every other value being held constant again including the random doctor effects. So for example, we could say that people who are married are expected to have .878 times as many tumors as people who are not married, for people with the same doctor (or same random doctor effect) and holding age and IL6 constant.
Like we did with the mixed effects logistic model, we can plot histograms of the expected counts from our model for our entire sample, holding the random effects at specific values. Here at the 20th, 40th, 60th, and 80th percentiles. This gives us a sense of how much variability in tumor count can be expected by doctor (the position of the distribution) versus by fixed effects (the spread of the distribution within each graph).
This time, there is less variability so the results are less dramatic than they were in the logistic example.
Finally, let’s look incorporate fixed and random effects for each individual and look at the distribution of expected tumor counts in our sample. that is, now both fixed and random effects can vary for every person.
Distribution Summary
Normal (Gaussian) | Binomial | Poisson | |
---|---|---|---|
Notation | |||
Parameters | |||
Support | |||
Mean | |||
Variance | |||
PDF/PMF |
Other Issues
For power and reliability of estimates, often the limiting factor is the sample size at the highest unit of analysis. For example, having 500 patients from each of ten doctors would give you a reasonable total number of observations, but not enough to get stable estimates of doctor effects nor of the doctor-to-doctor variation. 10 patients from each of 500 doctors (leading to the same total number of observations) would be preferable.
For parameter estimation, because there are not closed form solutions for GLMMs, you must use some approximation. Three are fairly common.
- Quasi-likelihood approaches use a Taylor series expansion to approximate the likelihood. Thus parameters are estimated to maximize the quasi-likelihood. that is, they are not true maximum likelihood estimates. A Taylor series uses a finite set of differentiations of a function to approximate the function, and power rule integration can be performed with Taylor series. With each additional term used, the approximation error decreases (at the limit, the Taylor series will equal the function), but the complexity of the Taylor polynomial also increases. Early quasi-likelihood methods tended to use a first order expansion, more recently a second order expansion is more common. In general, quasi-likelihood approaches are the fastest (although they can still be quite complex), which makes them useful for exploratory purposes and for large datasets. Because of the bias associated with them, quasi-likelihoods are not preferred for final models or statistical inference.
- The true likelihood can also be approximated using numerical integration. quadrature methods are common, and perhaps most common among these use the Gaussian quadrature rule, frequently with the Gauss-Hermite weighting function. It is also common to incorporate adaptive algorithms that adaptively vary the step size near points with high error. The accuracy increases as the number of integration points increases. Using a single integration point is equivalent to the so-called Laplace approximation. Each additional integration point will increase the number of computations and thus the speed to convergence, although it increases the accuracy. Adaptive Gauss-Hermite quadrature might sound very appealing and is in many ways. However, the number of function evaluations required grows exponentially as the number of dimensions increases. A random intercept is one dimension, adding a random slope would be two. For three level models with random intercepts and slopes, it is easy to create problems that are intractable with Gaussian quadrature. Consequently, it is a useful method when a high degree of accuracy is desired but performs poorly in high dimensional spaces, for large datasets, or if speed is a concern.
- A final set of methods particularly useful for multidimensional integrals are Monte Carlo methods including the famous Metropolis-Hastings algorithm and Gibbs sampling which are types of Markov chain Monte Carlo (MCMC) algorithms. Although Monte Carlo integration can be used in classical statistics, it is more common to see this approach used in Bayesian statistics.
Another issue that can occur during estimation is quasi or complete separation. Complete separation means that the outcome variable separate a predictor variable completely, leading perfect prediction by the predictor variable. Particularly if the outcome is skewed, there can also be problems with the random effects. For example, if one doctor only had a few patients and all of them either were in remission or were not, there will be no variability within that doctor.
References
- McCullagh, P, & Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall/CRC Press.
- Agresti, A. (2013). Categorical Data Analysis, 3rd ed. John Wiley & Sons.
- Breslow, N. E. & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421), pp 9-25, http://www.jstor.org/stable/2290687.
- Skrondal, A. & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC Press.
- Pinheiro, J. C. & Chao, E. C. (2006). Efficient Laplacian and Adaptive Gaussian quadrature Algorithms for Multilevel Generalized Linear Mixed Models. Journal of Computational and Graphical Statistics, 15(1), pp 58-81, www.tandfonline.com/doi/abs/10.1198/106186006X96962.