Regression through the origin is a technique used in some disciplines when theory suggests that the regression line must run through the origin, i.e., the point 0,0.
Example
We have a dataset that has standardized test scores for writing and reading ability. The tests are normed to have a mean of 50 and standard deviation of 10. Here is what the OLS regression for predicting the writing score from the reading score looks like.
F(1, 198) = 109.52 P = 0.0000 R-squared = 0.36 ------------------------------------------------- write | Coef. Std. Err. t P>|t| ---------+--------------------------------------- read | .5517051 .0527178 10.47 0.000 constant | 23.95944 2.805744 8.54 0.000 -------------------------------------------------
Here is how this model is interpreted. The coefficient of .55 for read indicates that for each one point increase in reading the expected writing score should increase by .55. The value of the constant is approximately 24 and means that a person with a reading score of zero would have a predicted writing score of 24.
An educator argues that the model doesn’t make sense because a people with zero reading ability should not be able have a score of nearly 24 on writing ability. In fact the person should have a score of zero on writing. The argument is the the regression line should go through the origin, i.e., the regression model should be run without a constant. Here is what that model would look like.
F(1, 199) = 7064.58 P = 0.0000 R-squared = 0.97 ------------------------------------------------- write | Coef. Std. Err. t P>|t| ---------+--------------------------------------- read | .9934953 .0118201 84.05 0.000 -------------------------------------------------
This model is interpreted to mean that for every one point increase in reading ability there is a .99 (nearly one) point increase in writing ability and that when the reading score is zero the predicted writing score is also zero.
In fact almost everything about this model looks great. The F-ratio is huge and the R-squared is .97.
Most writers on the topic suggest that you test to see whether the constant is zero (test H0: β0 = 0). Clearly in this example the intercept is significantly different from zero.
What if you used group 1 as the reference group? That is, what if group 1 was the group coded with all -1’s? In that case, the value of the constant would still be the grand mean. The coefficients will have different values but in all other respects the models are identical with the same F-ratio and R-squared regardless of which group is selected as the reference group.
Unbalanced data
By unbalanced data we mean unequal group sizes. There are a couple of differences when using effect coding with unbalanced designs. Consider the following four group design:
+-----------------------------------+ | group | g1 | g2 | g3 | g4 | |-------|------+------+------+------| | | 1 | | | 10 | | | 3 | 3 | 6 | 10 | | | 2 | 4 | | 9 | | | 2 | | 5 | 11 | +-----------------------------------+ | mean | 2 | 3.5 | 5.5 | 10 | +-----------------------------------+ Grand mean = 5.5 -- Unweighted grand mean = 5.25
In the table above, the grand mean (5.5) is the overall mean of the 12 observations while the unweighted grand mean (5.25) is just the simple average of the four group means.
Now if we do the standard effect coding and run the regression we get the following summary table.
F(3, 12) = 73.60 P = 0.0000 R-squared = 0.965 ------------------------------------------------- y | Coef. Std. Err. t P>|t| ---------+--------------------------------------- e1 | -3.25 .369755 -8.79 0.000 e2 | -1.75 .4635124 -3.78 0.005 e3 | .25 .4635124 0.54 0.604 constant | 5.25 .2420615 21.69 0.000 -------------------------------------------------
Now the constant for the model is the unweighed grand mean, i.e., the mean of means. In all other respects the coefficients are interpreted in the way except that you replace grand mean with the term unweighted grand mean. Of course, what is really going on is that for balanced groups the weighted and weighted means are the same.
Why use effect coding?
Here’s a good question, why use effect coding instead of dummy coding? If you have several categorical variables in a model it often doesn’t make much difference whether you use effect coding or dummy coding. However, if you have an interaction of two categorical variables then effect coding may provide some benefits. The primary benefit is that you get reasonable estimates of both the main effects and interaction using effect coding. With dummy coding the estimate of the interaction is fine but main effects are not “true” main effects but rather what are called simple effects, i.e., the effect of one variable at one level of the other variable. This is why most analysis of variance programs use some type of effect coding when estimating the various effects in an ANOVA model.
See Also