How do I interpret the coefficients of an effect-coded variable involved in an interaction in a regression model?
Table of Contents:
- Categorical predictors in regression models
- What is effect coding?
- Effect coding for a binary predictor
- Effect coding for categorical predictors with 3 or more levels
- Effect-coded predictors interacted with a continuous covariate
- Interaction of 2 effect-coded categorical predictors
- Summary
Categorical predictors in regression models
Categorical or nominal variables that are to be included as predictors in regression models must be first be transformed into a set of variables (henceforth referred to as “regressors”), where each individual variable typically codes for membership to a single category. The resulting set of regressors are then entered into the model in the same way as a quantitative predictor variable. Many different coding schemes can be used for these regressors that will produce models with equivalent fit, but the coefficients will have different interpretations. The most commonly used coding scheme for regression is dummy coding (also known as reference or indicator coding), for which a
What is effect coding?
Effect coding is an alternative coding scheme that produces coefficient estimates that have an interpretation analagous to effects estimated by analysis of variance (ANOVA). One popular reason to use effect coding is that in a model where the effect-coded categorical predictor is involved in an interaction, the coefficients for the regressors are interpreted as main effects. As mentioned above, the overall model fit and model predictions will be the same whether dummy coding or effect coding is used.
We again need to select one level of the categorical predictor whose regressor will be omitted from the regression model. Membership to this contrasting group is coded with a
Effect coding for a binary predictor
With just 2 levels, we can effect code a binary predictor with a single regressor with values
Imagine we have a simple data set which looks at the effects of 2 different study methods,
Here are the data, where we have 5 subjects in the
Method | Words |
---|---|
recitation | 9 |
recitation | 12 |
recitation | 6 |
recitation | 6 |
recitation | 7 |
writing | 16 |
writing | 9 |
writing | 11 |
The overall mean of
We only need a single regressor to enter the
Method | M1 | Words |
---|---|---|
recitation | 1 | 9 |
recitation | 1 | 12 |
recitation | 1 | 6 |
recitation | 1 | 6 |
recitation | 1 | 7 |
writing | -1 | 16 |
writing | -1 | 9 |
writing | -1 | 11 |
The regression model equation of
where
By plugging in the value
First, for the
The predicted number of words recalled for the
For the writing group,
The predicted number of words recalled for the
Interpretation of intercept
The intercept
Interpretation of coefficient
The equation for the predicted mean of the
Regression model. Let’s take a look at the estimates of a regression of
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- M1 | -2 1.074968 -1.86 0.112 -4.630351 .6303512 _cons | 10 1.074968 9.30 0.000 7.369649 12.63035 ------------------------------------------------------------------------------
Above we can see that the estimate of the intercept (labeled “_cons”) is
The coefficient for
Effect coding for categorical predictors with 3 or more levels
Now let’s look at how to effect code a categorical predictor with 3 levels and how to interpret its regression coefficients. The extension from a binary predictor is quite straightforward.
Imagine we have an additional method to memorizing a list of 20 words, which we’ll call the
Method | Words |
---|---|
recitation | 9 |
recitation | 12 |
recitation | 6 |
recitation | 6 |
recitation | 7 |
imagery | 13 |
imagery | 16 |
imagery | 9 |
imagery | 14 |
writing | 16 |
writing | 9 |
writing | 11 |
The means for each of the 3 groups are
Now that we have 3 conditions, we will need 2 (
Remember that each effect-coded variable has the value
We will create one regressor named
Method | M1 | M2 | Words |
---|---|---|---|
recitation | 1 | 0 | 9 |
recitation | 1 | 0 | 12 |
recitation | 1 | 0 | 6 |
recitation | 1 | 0 | 6 |
recitation | 1 | 0 | 7 |
imagery | 0 | 1 | 13 |
imagery | 0 | 1 | 16 |
imagery | 0 | 1 | 9 |
imagery | 0 | 1 | 14 |
writing | -1 | -1 | 16 |
writing | -1 | -1 | 9 |
writing | -1 | -1 | 11 |
The regression model equation is:
We can again get model predicted values for each group by substituting values of
The predicted value for
The predicted value for
The predicted value for
Interpretation of intercept
Interpretation of coefficients
Additionally, we see from
So in summary, regression coefficients for effect-coded regressors represent deviations of a particular category from the grand mean, and the sum of the regression coefficients for all effect-coded regressors is the negative deviation of the contrasting (omitted) group from the grand mean.
Regression model. Now let’s interpret the estimates of a regression of
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- M1 | -3 1.154166 -2.60 0.029 -5.610905 -.3890955 M2 | 2 1.215131 1.65 0.134 -.7488172 4.748817 _cons | 11 .8685165 12.67 0.000 9.035279 12.96472 ------------------------------------------------------------------------------
The unweighted grand mean of words recalled across all groups is
More than 3 categories. Extending these methods for more than 3 categories is straightforward. For a categorical predictor with
The reader is encouraged to confirm these interpretations with the following data and regression model:
Method | M1 | M2 | M3 | Words |
---|---|---|---|---|
recitation | 1 | 0 | 0 | 9 |
recitation | 1 | 0 | 0 | 12 |
recitation | 1 | 0 | 0 | 6 |
recitation | 1 | 0 | 0 | 6 |
recitation | 1 | 0 | 0 | 7 |
imagery | 0 | 1 | 0 | 13 |
imagery | 0 | 1 | 0 | 16 |
imagery | 0 | 1 | 0 | 9 |
imagery | 0 | 1 | 0 | 14 |
distraction | 0 | 0 | 1 | 4 |
distraction | 0 | 0 | 1 | 2 |
writing | -1 | -1 | -1 | 16 |
writing | -1 | -1 | -1 | 9 |
writing | -1 | -1 | -1 | 11 |
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- M1 | -1 1.200694 -0.83 0.424 -3.675313 1.675313 M2 | 4 1.281275 3.12 0.011 1.14514 6.85486 M3 | -6 1.62532 -3.69 0.004 -9.62144 -2.37856 _cons | 9 .801041 11.24 0.000 7.215169 10.78483 ------------------------------------------------------------------------------
Effect-coded predictors interacted with a continuous covariate
One of the advantages of using effect coding in interaction models is that some of the coefficients for lower-order effects are interpreted as main (averaged) effects, instead of as simple (specific) effects (as they are when dummy/reference coding is used). Let’s take a look.
Imagine we have augmented our word recall data set with the number of hours (
Method | M1 | M2 | Hours | Words |
---|---|---|---|---|
recitation | 1 | 0 | 2 | 9 |
recitation | 1 | 0 | 3.5 | 12 |
recitation | 1 | 0 | 0 | 6 |
recitation | 1 | 0 | 1 | 6 |
recitation | 1 | 0 | 1 | 7 |
imagery | 0 | 1 | 6 | 13 |
imagery | 0 | 1 | 13 | 16 |
imagery | 0 | 1 | 5 | 9 |
imagery | 0 | 1 | 8 | 14 |
writing | -1 | -1 | 5 | 16 |
writing | -1 | -1 | 2 | 9 |
writing | -1 | -1 | 3.5 | 11 |
If we believe that the effect of the number of hours on words recalled varies by the method used, we should model an interaction between
Method | M1 | M2 | Hours | M1Hours | M2Hours | Words |
---|---|---|---|---|---|---|
recitation | 1 | 0 | 2 | 2 | 0 | 9 |
recitation | 1 | 0 | 3.5 | 3.5 | 0 | 12 |
recitation | 1 | 0 | 0 | 0 | 0 | 6 |
recitation | 1 | 0 | 1 | 1 | 0 | 6 |
recitation | 1 | 0 | 1 | 1 | 0 | 7 |
imagery | 0 | 1 | 6 | 0 | 6 | 13 |
imagery | 0 | 1 | 13 | 0 | 13 | 16 |
imagery | 0 | 1 | 5 | 0 | 5 | 9 |
imagery | 0 | 1 | 8 | 0 | 8 | 14 |
writing | -1 | -1 | 5 | -5 | -5 | 16 |
writing | -1 | -1 | 2 | -2 | -2 | 9 |
writing | -1 | -1 | 3.5 | -3.5 | -3.5 | 11 |
The model regression equation is:
Let’s look at the predicted number of words recalled for each
For
For
For
Interpretation of intercept
If we set
The intercept estimate
Interpretation of effect-coded regressor coefficients
Similarly, for
The sum of the coefficients,
Interpretation of coefficient for continuous covariate
The effect of a continuous covariate, often called a “slope”, is expressed as the change in the expected value of the outcome per unit increase in the covariate. Let’s look at how an increase of one-unit of hours, from
Starting with
Now substituting
The change in the outcome can be calculated by taking the difference between these two predictions:
The expected change in the number of
We can do the same sort of calculations for the
The expected change in the number of
And for the
The expected change in the number of
If we take an average the effect of a one-unit increase in
The coefficient
Interpretation of interaction coefficients
From the equation for the effect of hours in the
The sum of the coefficients
Regression model
To confirm that our interpretations are correct, we will run regressions of
For the
Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- Hours | 1.857143 .2973809 6.24 0.008 .9107442 2.803541 _cons | 5.214286 .5681453 9.18 0.003 3.406194 7.022378 ------------------------------------------------------------------------------
For the
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- Hours | .7105263 .2994686 2.37 0.141 -.5779831 1.999036 _cons | 7.315789 2.567408 2.85 0.104 -3.730877 18.36246 ------------------------------------------------------------------------------
For the
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- Hours | 2.333333 .5773503 4.04 0.154 -5.002597 9.669264 _cons | 3.833333 2.140872 1.79 0.324 -23.36903 31.03569 ------------------------------------------------------------------------------
The average of the three intercepts is
Now we will run all groups together in one regression model with interactions. The resulting estimates should be equivalent to those from the 3 separate models:
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- M1 | -.2401838 1.154527 -0.21 0.842 -3.06521 2.584842 M2 | 1.86132 1.459926 1.27 0.249 -1.710991 5.433631 Hours | 1.633668 .2715401 6.02 0.001 .9692329 2.298102 M1Hours | .2234754 .3930287 0.57 0.590 -.7382313 1.185182 M2Hours | -.9231412 .2976688 -3.10 0.021 -1.651511 -.1947719 _cons | 5.45447 1.018941 5.35 0.002 2.96121 7.947729 ------------------------------------------------------------------------------
The estimate of
We can recover the intercepts and
Interaction of 2 effect-coded categorical predictors
Interpreting a regression model where 2 effect-coded categorical predictors are interacted will be very similar to interpreting a 2-way ANOVA with interactions.
To demonstrate, we will add a 2-level categorical predictor to our data that codes whether the subject was sleep-deprived while studying the word list (we will not be modeling the continuous covariate
Method | M1 | M2 | Deprived | Words |
---|---|---|---|---|
recitation | 1 | 0 | No | 9 |
recitation | 1 | 0 | No | 12 |
recitation | 1 | 0 | Yes | 6 |
recitation | 1 | 0 | Yes | 6 |
recitation | 1 | 0 | Yes | 7 |
imagery | 0 | 1 | No | 13 |
imagery | 0 | 1 | Yes | 16 |
imagery | 0 | 1 | Yes | 9 |
imagery | 0 | 1 | No | 14 |
writing | -1 | -1 | No | 16 |
writing | -1 | -1 | Yes | 9 |
writing | -1 | -1 | Yes | 11 |
We will transform the
Method | M1 | M2 | Deprived | D1 | M1D1 | M2D1 | Words |
---|---|---|---|---|---|---|---|
recitation | 1 | 0 | No | -1 | -1 | 0 | 9 |
recitation | 1 | 0 | No | -1 | -1 | 0 | 12 |
recitation | 1 | 0 | Yes | 1 | 1 | 0 | 6 |
recitation | 1 | 0 | Yes | 1 | 1 | 0 | 6 |
recitation | 1 | 0 | Yes | 1 | 1 | 0 | 7 |
imagery | 0 | 1 | No | -1 | 0 | -1 | 13 |
imagery | 0 | 1 | Yes | 1 | 0 | 1 | 16 |
imagery | 0 | 1 | Yes | 1 | 0 | 1 | 9 |
imagery | 0 | 1 | No | -1 | 0 | -1 | 14 |
writing | -1 | -1 | No | -1 | 1 | 1 | 16 |
writing | -1 | -1 | Yes | 1 | -1 | -1 | 9 |
writing | -1 | -1 | Yes | 1 | -1 | -1 | 11 |
For a model where we regress
There are 6 possible groups formed by crossing the 3
First,
Then,
Then,
Then,
Then,
Finally,
Interpretation of intercept
The mean of the predictions for the 6 groups is:
So we see that the intercept
Interpretation of lower-order coefficients
One of the reasons to use effect coding is so that the lower-order coefficients (not for the interactions) are interpreted as averaged or main effects.
To understand how to interpret
We know that
To interpret
Using the same logic, we see that
The sum of the coefficients
Finally, to interpret
We see that
Interpretation of interaction coefficients
The interpretation of the interaction coefficients can be gleaned by looking at the prediction equations again. First we look at the equations for groups (
We know that
The coefficient
The intercept
The sums of the interaction coefficients
First,
The sum of the coefficients
Regression Model. Let’s see if we can use the coefficients of a fully-interacted regression model to recover the means of the 6 groups.
Here are the means:
Method, Deprived | mean Words |
---|---|
recitation, yes | 6.3333 |
recitation, no | 10.5 |
imagery, yes | 12.5 |
imagery, no | 13.5 |
writing, yes | 10 |
writing, no | 16 |
Now the regression of
------------------------------------------------------------------------------ Words | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- M1 | -3.055556 .93204 -3.28 0.017 -5.336175 -.7749358 M2 | 1.527778 .9711634 1.57 0.167 -.8485736 3.904129 D1 | -1.861111 .704556 -2.64 0.038 -3.585098 -.1371247 M1D1 | -.2222222 .93204 -0.24 0.819 -2.502842 2.058397 M2D1 | 1.361111 .9711634 1.40 0.211 -1.01524 3.737462 _cons | 11.47222 .704556 16.28 0.000 9.748236 13.19621 ------------------------------------------------------------------------------
The estimate for the group (
Summary
When using effect coding for categorical predictors in a regression model without interactions:
- The intercept is interpreted as the estimate of the unweighted grand mean of the dependent variable across all groups comprising the categorical predictor(s) (and at zero for all other predictors in the model)
- a coefficient for an effect-coded regressors is interpreted as a deviation from the grand mean for the category coded as
for that regressor - the sum of the coefficients for all effect-coded regressors representing a single categorical predictor is the negative deviation from the grand mean of the contrasting group (group coded by
in the regressors)
When using effect coding for categorical predictors in regression models with interactions:
- the intercept is still interpreted the same as in a model without interactions
- if the effect-coded regressors are interacted a continuous predictor:
- the lower order coefficients for the effect-coded regressors are interepreted as deviations from the grand mean for that group (the group coded
on the regressor) when the interacting continuous predictor is zero - the lower order coefficient for the continuous predictor is interpreted as the unweighted average slope (i.e. main effect) of the continuous predictor, averaged across groups of the interacting categorical predictor
- the interaction coefficients are interepreted as deviations of a group’s slope from the average slope estimate (or as change in the group effect per unit change in the slope)
- the lower order coefficients for the effect-coded regressors are interepreted as deviations from the grand mean for that group (the group coded
- if the effect-coded regressor is interacted with another effect-coded predictor:
- the lower order coefficient of an effect-coded regressors is interpreted as the average effect (deviation from the grand mean) of the group coded as
on the regressor, across all levels of the interacting categorical predictor - the interaction coefficients are interpreted as the additional effect of being simultaneously in the two groups defined by the interaction regressor (or, alternatively, if A and B are interacted, the change in the average effect of A for group B compared to the average effect of A)
- the lower order coefficient of an effect-coded regressors is interpreted as the average effect (deviation from the grand mean) of the group coded as