FAQ: How can I determine the correct error term in an ANOVA?

<!- 12/7/07 — pbe –>

One method for determining correct denominators in analysis of variance is the Cornfield-Tukey method. This FAQ presents a modified version of the Cornfield-Tukey method for manually deriving the symbolic values for the expected mean squares. It is from these expected mean squares that one can determine appropriate error terms.

Please note that this approach to deriving expected mean squares assumes that the interaction of the fixed and random effects sum to zero over the fixed effect levels. This approach can be found in a number of classical ANOVA textbooks, such as, Kirk, Winer and Keppel. This assumption, however, is not universal and is not used in most mixed programs (proc mixed in SAS, mixed in Stata, etc).

If you would like to try a program that automates much of the computation for this algorithm, go to How can I determine the correct term in an anova using Stata?.

Steps in deriving expected mean squares

Step  1 - Write the linear model for the design.
Step  2 - Construct a table with three parts.
Step  3 - The row headings in part 1 contain each of the terms from the linear 
          model including their subscripts but leaving out μ.
Step  4 - The column heading in part 2 contain the subscripts from the linear
          model, the symbol for the number of levels along with the sampling
          coefficient.  Sampling coefficients are coded 1 for random variables 
          and 0 for fixed.
Step  5 - If a column heading appears as a row subscript in parentheses
          enter a 1 in part 2.
Step  6 - If a column heading appears as a row subscript, not in parentheses,
          enter the appropriate sampling coefficient (0 or 1).
Step  7 - If a column heading does not appear as a row subscript enter the
          letter for the number of levels
Step  8 - In part 3 list a variance for each term in the linear model that
          contains all the row subscripts.
Step  9 - Coefficients for variances are obtained by covering the column headed
          by subscripts that appear in the row but not including subscripts in
          parentheses.  Obviously, terms with zero coefficients drop out.

Example three-way factorial design

In this example, A & C are fixed and B is random. The subscript for ε is i(jkl) because the subjects are nested in the A*B*C cells. The subjects themselves are also random. The term, ε_i(jkl), is known as error, within cell or residual.

Step 1 - Y_ijkl = μ + α_j + β_k + γ_l + αβ_jk + αγ_jl + βγ_kl + αβγ_jkl + ε_i(jkl)


Part 1          Part 2               Part 3
subscript       i    j    k    l
levels          n    p    q    m
sampling coef   1    0    1    0
-------------------------------------------------------------------

α_j              n    0    q    m     σ²_ε + 0σ²_αβγ + 0σ²_αγ + nmσ²_αβ + nqmσ²_α
                                     σ²_ε + nmσ²_αβ + nqmσ²_α
                                                      
β_k              n    p    1    m     σ²_ε + 0σ²_αβγ + 0σ²_βγ + 0σ²_αβ + npmσ²_β
                                     σ²_ε + npmσ²_β

γ_l              n    p    q    0     σ²_ε + 0σ²_αβγ + npσ²_βγ + 0σ²_αγ + npqσ²_γ
                                     σ²_ε + npσ²_βγ + npqσ²_γ

αβ_jk            n    0    1    m     σ²_ε + 0σ²_αβγ + nmσ²_αβ
                                     σ²_ε + nmσ²_αβ

αγ_jl            n    0    q    0     σ²_ε + nσ²_αβγ + nqσ²_αγ

βγ_kl            n    p    1    0     σ²_εσ²_ε + 0σ²_αβγ + npσ²_βγ
                                     σ²_ε + npσ²_βγ

αβγ_jkl          n    0    1    0     σ²_ε + nσ²_αβγ

ε_i(jkl)          1    1    1    1     σ²_ε

A correctly formed F-ratio will have one more term in the numerator than in the denominator. The additional term in the numerator is the effect of interest. Thus, the F-ration for A main effect would look something like this:

        σ²_ε + nmσ²_αβ + nqmσ²_α        MS(A)
F(A) = ------------------------  =  ---------
        σ²_ε + nmσ²_αβ                 MS(A*B)

Here are the terms that go into each of the F-ratios for the above model:

Effect      Error Term
numerator   denominator
MS(A)       MS(A*B)
MS(B)       MS(residual)
MS(C)       MS(B*C)
MS(A*B)     MS(residual)
MS(A*C)     MS(A*B*C)
MS(B*C)     MS(residual)
MS(A*B*C)   MS(residual)

Reference

Kirk, Roger E. (1998) Experimental Design: Procedures for the Behavioral Sciences, Third Edition. Monterey, California: Brooks/Cole Publishing. ISBN 0-534-25092-0