Categorical variables require special attention in regression analysis because,
unlike dichotomous or continuous variables, they cannot by entered into the
regression equation just as they are. For example, if you have a
variable called **race** that is coded 1=Hispanic, 2=Asian 3=Black 4=White,
then entering **race** in your regression will look at the look at the linear
effect of race. Instead, categorical variables like this need to be
recoded into a series of variables which can then be
entered into the regression model. There are a variety of coding systems that can be used when
coding categorical
variables. Ideally, you would choose a
coding system that reflects the comparisons that you want to make. In Chapter
3 of the Regression with SPSS Web Book
we covered the use of categorical variables in regression analysis focusing on
the use of dummy variables, but that is not the only coding scheme that you can
use. For example,
you may want to compare each level to the next higher
level, in which case you would want to use "repeated" coding, or you
might want to compare each level to the mean of the previous levels of the
variable, in which case you would want to use "difference" coding. By
deliberately choosing a coding system, you can obtain comparisons that are most
meaningful for testing your hypotheses. Regardless of the coding system you choose, the
test of the overall effect
of the categorical variable (e.g. the overall effect of **race**) will remain the same. Below is a table listing various types of contrasts and the
comparison that they make.

Name of contrast |
Comparison made |

Simple Coding | Compares each level of a variable to the reference level |

Deviation Coding | Compares deviations from the grand mean |

Difference Coding | Compares levels of a variable with the mean of the previous levels of the variable |

Helmert Coding | Compare levels of a variable with the mean of the subsequent levels of the variable |

Orthogonal Polynomial Coding | Orthogonal polynomial contrasts |

Repeated Coding | Adjacent levels of a variable |

Special User-Defined Coding | User-defined contrast |

There are a couple of notes to be made about the coding systems listed
above. The first is that they represent planned comparisons and not post
hoc comparisons. In other words, they are comparisons that you plan to do
before analyzing your data, not comparisons that you think of once you have seen
the results of preliminary analyses. Also, some forms of coding
make more sense with ordinal categorical variables than with nominal categorical
variables. Below we will show examples using **race** as a categorical
variable, which is a nominal variable. Because simple effect coding compares the mean of the
dependent variable for each level of the categorical variable to the mean of the
dependent variable at for the reference level, it makes sense with a nominal
variable.
However, it may not make as much sense to use a coding scheme that tests the linear
effect of **race**. As we describe each type of coding system, we note
those coding systems with which it does not make as much sense to use a nominal
variable.

This page will illustrate three ways that you can conduct analyses using
these coding schemes — 1) using the **glm** command with **/lmatrix** to
specify "contrast" coefficients that specify groups that are to be
compared**, **2) using the **glm** command with **/contrast** to
specify one of the SPSS predefined coding schemes, or 3) using **regress**
but first creating k-1 new variables (where k is the number of
levels of the categorical variable) and using
these new variables as predictors in your regression model. While methods 1
and 3 both involve manually specifying "contrasts", method 1 uses a
type of coding we will call "contrast coding" that specifies which
groups are to be compared. By comparison, method 3 uses a type of coding
we will call "regression coding". There are benefits and
drawbacks of each of these 3 methods. For example, methods 1 and 3 allow
you to
manually code the contrasts and give you absolute control over the coding, but
the drawback is that it is relatively easy to make
an error in the coding. By contrast, method 2 automates the process by
letting SPSS do the coding for you, but you are limited to just the pre-defined
coding schemes that SPSS has created.

The examples in this page will use dataset called hsb2.sav
and we will focus on the categorical variable **race**, which has four levels (1 =
Hispanic, 2 = Asian, 3 = African American and 4 = white) and we will use **write**
as our dependent variable. Although our
example uses a variable with four levels, these coding systems work with
variables that have more categories or fewer categories. No matter which coding system you select, you will always have one fewer recoded variables
than levels of the original variable. In our example, our categorical
variable has four levels so we will have three new variables (a variable corresponding to the final level of the categorical
variables would be redundant and therefore unnecessary.)

Before considering any analyses, let’s look at the mean of the dependent
variable, **write**, for each level of **race**. This will help in interpreting
the output from the analyses.

means tables = write by race.

Cases | ||||||
---|---|---|---|---|---|---|

Included | Excluded | Total | ||||

N | Percent | N | Percent | N | Percent | |

writing score * RACE | 200 | 100.0% | 0 | .0% | 200 | 100.0% |

RACE | Mean | N |
---|---|---|

hispanic | 46.4583 | 24 |

asian | 58.0000 | 11 |

african-amer | 48.2000 | 20 |

white | 54.0552 | 145 |

Total | 52.7750 | 200 |

## SIMPLE EFFECT CODING

The results of simple effect coding is very similar to dummy coding in that each level is compared to the reference level. In the example below, level 4 is the reference level and the first comparison compares level 1 to level 4, the second comparison compares level 2 to level 4, and the third comparison compares level 3 to level 4.

**Method 1: GLM with /LMATRIX**

Contrast coding is more straightforward than regression coding, so we will begin with it. Simple effect coding follows the rule for effect coding that the values in each new variable sum to zero. The first contrast compares level 1 to level 4, and level 1 is coded "1" and level 4 is coded "-1". Likewise, the second contrast compares level 2 to level 4 by coding level 2 "1" and level 4 "-1". As you can see with contrast coding, you can discern the meaning of the comparisons simply by inspecting the contrast coefficients. For example, looking at the contrast coefficients for c3 you can see that this compares level 3 to level 4.

SIMPLE effect contrast coding

Level of race | New variable 1 (c1) | New variable 2 (c2) | New variable 3 (c3) |

1 (Hispanic) | 1 | 0 | 0 |

2 (Asian) | 0 | 1 | 0 |

3 (African American) | 0 | 0 | 1 |

4 (white) | -1 | -1 | -1 |

Below we illustrate how to form these comparisons using the **GLM**
command with **/lmatrix**. As you see, a separate **/lmatrix**
statement is used for each contrast.

glm write by race /lmatrix "level 1 versus level 4" race 1 0 0 -1 /lmatrix "level 2 versus level 4" race 0 1 0 -1 /lmatrix "level 3 versus level 4" race 0 0 1 -1.

Each of the above **/lmatrix** statements produced two tables shown below,
"Contrast Results (K Matrix)" and "Test Results". The
contrast estimate for the first contrast, comparing the mean of the dependent
variable, **write**, for levels 1 and 4 is -7.597, and
is statistically significant (p<.000). The F-value associated with this test
is given in the "Test Results" table and is 14.590. The p-value given
in the "Contrast Results (K Matrix)" table and the p-value in the
"Test Results" table are necessarily the same because they both refer
to the same test of the contrast coefficient to zero. The results of the second
contrast, comparing the mean of **write** for levels 2 and 4 is not
statistically significant (F = 1.953, p = .164), while the third contrast is
statistically significant.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -7.597 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | -7.597 | ||

Std. Error | 1.989 | ||

Sig. | .000 | ||

95% Confidence Interval for Difference | Lower Bound | -11.519 | |

Upper Bound | -3.675 | ||

a Based on the user-specified contrast coefficients (L’) matrix: group 1 versus group 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 1188.388 | 1 | 1188.388 | 14.590 | .000 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 3.945 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | 3.945 | ||

Std. Error | 2.823 | ||

Sig. | .164 | ||

95% Confidence Interval for Difference | Lower Bound | -1.622 | |

Upper Bound | 9.511 | ||

a Based on the user-specified contrast coefficients (L’) matrix: group 2 versus group 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 159.108 | 1 | 159.108 | 1.953 | .164 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 | ||

a Based on the user-specified contrast coefficients (L’) matrix: group 3 versus group 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 602.550 | 1 | 602.550 | 7.398 | .007 |

Error | 15964.717 | 196 | 81.453 |

**
Method 2: GLM with /CONTRAST**

Instead of using the **/lmatrix** statement, we can achieve the same
results using the **/contrast** statement with the **glm** command.
Instead of specifying the numbers to be used in the contrast as we did above, we
can simply type in the name of the contrast that we wish to use, and SPSS will
do the coding for us. We will use the **/print = test(lmatrix)** statement to
have SPSS print out the coding scheme that it used to make the contrasts. You
will notice that the table entitled "Contrast Coefficients (L’
Matrix)" is the same as the table we used in Method 1 above.

glm write by race /contrast (race)=simple /print = test(lmatrix).

As you see in the output below, the table titled "Contrast Coefficients (L’ Matrix)" shows
the coding scheme that was used for each comparison. The table entitled
"Contrast Results (K Matrix)" shows the results of the various
contrasts. In our example, the difference between level 1 of ** race** and
level 4 of ** race** is statistically significant. You will notice that the
contrast estimate is the difference between the mean for the dependent variable
for the first level minus the mean of the dependent variable for the omitted level.
In other words, the mean for group 1 minus the mean for group 4 which is 46.4583 – 54.0552 = -7.597.
The row labeled "Sig." is .000, indicating that this difference is
significant, and this is followed by a confidence interval for the difference.
The next part of the table compares level 2 of ** race** and level 4 of
**
race** and shows that this difference is not statistically significant and
the next part of the table shows the difference between level 3 of
**
race** and level 4 of ** race** is statistically significant. You
might note that while the significance ("Sig.") is given for each of
these tests, there is no "t" value, but you could obtain this by
dividing the "Contrast Estimate" by the "Std. Error", e.g.
-7.597 / 1.989 .

The table entitled "Test Results" indicates that the test of the
overall effect **race** is
statistically significant. In other words, it is a test of all of the
contrasts taken together.

RACE Simple Contrast(a) | |||
---|---|---|---|

Parameter | Level 1 vs. Level 4 | Level 2 vs. Level 4 | Level 3 vs. Level 4 |

Intercept | 0 | 0 | 0 |

[RACE=1.00] | 1 | 0 | 0 |

[RACE=2.00] | 0 | 1 | 0 |

[RACE=3.00] | 0 | 0 | 1 |

[RACE=4.00] | -1 | -1 | -1 |

The default display of this matrix is the transpose of the corresponding L matrix. | |||

a Reference category = 4 |

Dependent Variable | |||
---|---|---|---|

RACE Simple Contrast(a) | writing score | ||

Level 1 vs. Level 4 | Contrast Estimate | -7.597 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | -7.597 | ||

Std. Error | 1.989 | ||

Sig. | .000 | ||

95% Confidence Interval for Difference | Lower Bound | -11.519 | |

Upper Bound | -3.675 | ||

Level 2 vs. Level 4 | Contrast Estimate | 3.945 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | 3.945 | ||

Std. Error | 2.823 | ||

Sig. | .164 | ||

95% Confidence Interval for Difference | Lower Bound | -1.622 | |

Upper Bound | 9.511 | ||

Level 3 vs. Level 4 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate – Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 | ||

a Reference category = 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 1914.158 | 3 | 638.053 | 7.833 | .000 |

Error | 15964.717 | 196 | 81.453 |

**Method 3: Regression**

The regression coding is a bit more complex than contrast coding. In our example below, level 4 is the reference level and x1 compares level 1 to level 4, x2 compares level 2 to level 4, and x3 compares level 3 to level 4. For x1 the coding is 3/4 (.75) for level 1, and -1/4 (-.25) for all other levels. Likewise, for x2 the coding is 3/4 (.75) for level 2, and -1/4 (-.25) for all other levels, and for x3 the coding is 3/4 (.75) for level 3, and -1/4 (-.25) for all other levels. Note that each new variable must sum to zero.

SIMPLE regression coding

Level of race | New variable 1 (x1) | New variable 2 (x2) | New variable 3 (x3) |

1 (Hispanic) | .75 | -.25 | -.25 |

2 (Asian) | -.25 | .75 | -.25 |

3 (African American) | -.25 | -.25 | .75 |

4 (white) | -.25 | -.25 | -.25 |

Below we illustrate how to create **x1** **x2** and **x3** and enter
these new variables into the regression model using the **regression**
command.

if race = 1 x1 = .75. if any(race,2,3,4) x1 = -.25. if race = 2 x2 = .75. if any(race,1,3,4) x2 = -.25. if race = 3 x3 = .75. if any(race,1,2,4) x3 = -.25.execute.

regression /dependent = write /method = enter x1 x2 x3.

You will notice that the regression coefficients in the table below are the same
as the contrast coefficients that we saw using the **glm** command. Both the regression coefficient for x1 and the contrast estimate for
c1 are the mean of ** write** for level 1 of **race** (Hispanic) minus the mean of
** write**
for level 4 (white). Likewise, the
regression coefficient for x2 and the contrast estimate for c2 are the mean of ** write** for level 2 (Asian) minus the mean of
** write**
for level 4 (white). The F-value shown in the **glm** output is the square of
the t-value shown in the regression table below. The results that were statistically
significant in the previous analyses are significant in this analysis, and the
one that was not statistically significant is not significant in this analysis
either.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.678 | .982 | 52.619 | .000 | |

X1 | -7.597 | 1.989 | -.261 | -3.820 | .000 | |

X2 | 3.945 | 2.823 | .095 | 1.398 | .164 | |

X3 | -5.855 | 2.153 | -.186 | -2.720 | .007 | |

a Dependent Variable: writing score |

## DEVIATION EFFECT CODING

This coding system compares the mean of the dependent variable for a
given level to the mean of the dependent variable for the other levels of the
variable. In our example below, the first comparison compares level 1
(Hispanics)
to all three other levels of **race**, the second comparison compares level 2 (Asians) to the
three other levels of **race**, and the third comparison compares level 3 (African Americans) to
the three other levels of **race**.

As you can see, the logic of the contrast coding is fairly straightforward. The first comparison compares level 1 to levels 2, 3 and 4. A value of 3/4 (.75) is assigned to level 1 and a value of -1/4 (.25) is assigned to levels 2, 3 and 4. Likewise, the second comparison compares level 2 to levels 1, 3 and 4. A value of 3/4 (.75) is assigned to level 2 and a value of -1/4 (.25) is assigned to levels 1, 3 and 4. A similar pattern is followed for assigning values for the third comparison. Note that you could substitute 3 for 3/4 and 1 for 1/4 and you would get the same test of significance, but the contrast coefficient would be different.

**Method 1: GLM with /LMATRIX**

DEVIATION contrast coding

Level of race | New variable 1 (c1) | New variable 2 (c2) | New variable 3 (c3) |

Level 1 v. Mean | Level 2 v. Mean | Level 3 v. Mean | |

1 (Hispanic) | .75 | -.25 | -.25 |

2 (Asian) | -.25 | .75 | -.25 |

3 (African American) | -.25 | -.25 | .75 |

4 (white) | -.25 | -.25 | -.25 |

Below we illustrate how to form these comparisons using the **GLM**
command with **/lmatrix**. As you see, a separate **/lmatrix**
statement is used for each contrast.

glm write by race /lmatrix "level 1 versus levels 2 3 and 4" race .75 -.25 -.25 -.25 /lmatrix "level 2 versus levels 1 3 and 4" race -.25 .75 -.25 -.25 /lmatrix "level 3 versus levels 1 2 and 4" race -.25 -.25 .75 -.25.

The first two tables in the output above are generated by the first **/lmatrix**
statement; the second two tables are generated by the second **/lmatrix**
statement, and so on. In the first "Contrast Results (K Matrix)"
table, the contrast estimate is the mean for group 1 minus the grand mean.
However, this grand mean is not the mean of the dependent variable that is listed in the
output of the **means** command above. Rather it is the mean of means of the
dependent variable at each level of the categorical variable: (46.4583 +
58 + 48.2 + 54.0552) / 4 = 51.678375. This contrast estimate is then 46.4583 - 51.678375 =
-5.220.
The difference between this value and zero (the null hypothesis that the
contrast coefficient is zero) is statistically significant (p = .002), and the
"Test Results" table below that shows the F value for this test of
10.328. The results for the next 2 contrasts were computed in a similar
manner.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -5.220 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.220 | ||

Std. Error | 1.631 | ||

Sig. | .002 | ||

95% Confidence Interval for Difference | Lower Bound | -8.437 | |

Upper Bound | -2.003 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 1 versus groups 1 2 and 3 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 833.927 | 1 | 833.927 | 10.238 | .002 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 6.322 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 6.322 | ||

Std. Error | 2.160 | ||

Sig. | .004 | ||

95% Confidence Interval for Difference | Lower Bound | 2.061 | |

Upper Bound | 10.582 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 2 versus groups 1 3 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 697.475 | 1 | 697.475 | 8.563 | .004 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -3.478 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -3.478 | ||

Std. Error | 1.732 | ||

Sig. | .046 | ||

95% Confidence Interval for Difference | Lower Bound | -6.895 | |

Upper Bound | -6.203E-02 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 3 versus groups 1 2 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 328.405 | 1 | 328.405 | 4.032 | .046 |

Error | 15964.717 | 196 | 81.453 |

**Method 2: GLM with /CONTRAST**

Now
let's conduct the same analysis using the **/contrast** statement instead of
the **/lmatrix** statement. Instead of providing the values for the
contrasts that we want to perform, we can have SPSS provide those for us by
indicating the type of effect coding that we wish to use, in this case,
deviation effect coding.

glm write by race /contrast (race)=deviation /print = test(lmatrix).

The contrasts estimates in the table entitled "Contrast Results (K Matrix)"
are the mean of the particular level minus the grand (unweighted) mean. This grand mean is not the mean of the dependent variable that is listed in the
output of the **means** command above. Rather it is the mean of means of the
dependent variable at each level of the categorical variable: (46.4583 +
58 + 48.2 + 54.0552) / 4 = 51.678375. The contrast estimate for level 1
versus mean is then 46.4583 - 51.678375 = -5.220. The difference between this value and zero (the null hypothesis that the
contrast coefficient is zero) is statistically significant (p = .002). The
contrast coefficients for the other comparisons are calculated in the same
manner.

RACE Deviation Contrast(a) | |||
---|---|---|---|

Parameter | Level 1 vs. Mean | Level 2 vs. Mean | Level 3 vs. Mean |

Intercept | .000 | .000 | .000 |

[RACE=1.00] | .750 | -.250 | -.250 |

[RACE=2.00] | -.250 | .750 | -.250 |

[RACE=3.00] | -.250 | -.250 | .750 |

[RACE=4.00] | -.250 | -.250 | -.250 |

The default display of this matrix is the transpose of the corresponding L matrix. | |||

a Omitted category = 4 |

Dependent Variable | |||
---|---|---|---|

RACE Deviation Contrast(a) | writing score | ||

Level 1 vs. Mean | Contrast Estimate | -5.220 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.220 | ||

Std. Error | 1.631 | ||

Sig. | .002 | ||

95% Confidence Interval for Difference | Lower Bound | -8.437 | |

Upper Bound | -2.003 | ||

Level 2 vs. Mean | Contrast Estimate | 6.322 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 6.322 | ||

Std. Error | 2.160 | ||

Sig. | .004 | ||

95% Confidence Interval for Difference | Lower Bound | 2.061 | |

Upper Bound | 10.582 | ||

Level 3 vs. Mean | Contrast Estimate | -3.478 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -3.478 | ||

Std. Error | 1.732 | ||

Sig. | .046 | ||

95% Confidence Interval for Difference | Lower Bound | -6.895 | |

Upper Bound | -6.203E-02 | ||

a Omitted category = 4 |

**Method 3: Regression**

As you see in the example below, the regression coding is accomplished by assigning "1" to level 1 for the first comparison (because level 1 is the level to be compared to all others), a "1" to level 2 for the second comparison (because level 2 is to be compared to all others), and "1" to level 3 for the third comparison (because level 3 is to be compared to all others). Note that a "-1" is assigned to level 4 for all 3 comparisons (because it is the level that is never compared to the other levels) and all other values are assigned a 0. This regression coding scheme yields the comparisons described above.

DEVIATION regression coding

Level of race | New variable 1 (x1) | New variable 2 (x2) | New variable 3 (x3) |

Level 1 v. Mean | Level 2 v. Mean | Level 3 v. Mean | |

1 (Hispanic) | 1 | 0 | 0 |

2 (Asian) | 0 | 1 | 0 |

3 (African American) | 0 | 0 | 1 |

4 (white) | -1 | -1 | -1 |

Below we illustrate how to create **x1** **x2** and **x3** and enter
these new variables into the regression model using the **regression**
command.

if race = 1 x1 = 1. if any(race,2,3) x1 = 0. if race = 4 x1 = -1. if race = 2 x2 = 1. if any(race,1,3) x2 = 0. if race = 4 x2 = -1. if race = 3 x3 = 1. if any(race,1,2) x3 = 0. if race = 4 x3 = -1. execute. regression /dep write /method = enter x1 x2 x3.

In this example, both the
regression coefficient for x1 and the contrast estimate for c1
would be the mean of ** write** for level 1 (Hispanic) minus the mean of
** write**
for levels 2, 3 and 4 combined. Likewise, the
regression coefficient for x2 and the contrast estimate for c2
would be the mean of ** write** for level 2 (Asian) minus the mean of ** write**
for levels 1, 3, and 4 combined. As we saw in the previous analyses, all
three contrasts are statistically significant.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.678 | .982 | 52.619 | .000 | |

X1 | -5.220 | 1.631 | -.382 | -3.200 | .002 | |

X2 | 6.322 | 2.160 | .385 | 2.926 | .004 | |

X3 | -3.478 | 1.732 | -.242 | -2.008 | .046 | |

a Dependent Variable: writing score |

## DIFFERENCE CODING

In this coding system, each level is compared to the mean of the previous
levels. In our example, the first contrast codes the comparison of the mean of the
dependent variable for level 2 of ** race** to the mean of the dependent variable for
level 1 of **race**. The second comparison compares the mean of the
dependent variable level 3 of race with both levels 1 and 2 of ** race**, and the third comparison compares the
mean of the dependent variable for level 4 of race with levels 1,2 and 3. Clearly, this coding system does not make much sense with our
example of ** race** because it is a nominal variable. However, this system is
useful when the levels of the categorical variable are ordered in a meaningful
way. For example, if we had a categorical variable in which work-related
stress was coded as low, medium or high, then comparing the means of the
previous levels of the variable would make more sense.

For contrast coding, we see that the first comparison comparing levels 1 and 2 are coded "-1" and "1" to compare these levels, and "0" otherwise. The second comparison comparing levels 1, 2 with level 3 are coded "-.5", "-.5", "1" and "0", and the last comparison comparing levels 1, 2 and 3 with level 4 are coded "-.333", "-.333", "-.333" and "1".

**Method 1: GLM with /LMATRIX**

DIFFERENCE contrast coding

New variable 1 (c1) | New variable 2 (c2) | New variable 3 (c3) | |

Level 2 v. Level 1 | Level 3 v. Previous | Level 4 v. Previous | |

1 (Hispanic) | -1 | -.5 | -.333 |

2 (Asian) | 1 | -.5 | -.333 |

3 (African American) | 0 | 1 | -.333 |

4 (white) | 0 | 0 | 1 |

Below we illustrate how to form these comparisons using the **GLM**
command with **/lmatrix**. Note the use of fractions on the **/lmatrix** statement.
As mentioned above, you need to use numbers that sum to zero, such as
1/3 + 1/3 + 1/3 - 1. You cannot use .333 instead of 1/3: SPSS will
give an error message and fail to calculate the contrast coefficient. The
problem is that .333 + .333 + .333 - 1 is not sufficiently close to
zero.

glm write by race /lmatrix "level 2 versus level 1" race -1 1 0 0 /lmatrix "level 3 versus levels 1 and 2" race -.5 -.5 1 0 /lmatrix "level 4 versus levels 1 2 and 3" race -1/3 -1/3 -1/3 1.

The contrast estimate for the first comparison shown in this output was
calculated by subtracting the mean of the dependent variable for level 1 of the
categorical variable from the mean of the dependent variable for level 2: 58 - 46.4583 = 11.542.
This result is statistically significant. The
contrast estimate for the second comparison (between level 3 and the previous
levels) was calculated by subtracting the mean of the dependent variable for
levels 1 and 2 from that of level 3: 48.2 - [(46.4583 + 58) / 2] =
-4.029. This result is not statistically significant, meaning that there
is not a reliable difference between the mean of ** write** for level 3 of ** race**
compared to the mean of ** write** for levels 1 and 2 (Hispanics and Asians).
As noted above, this type of coding system does not make much sense for a
nominal variable such as **race**. For the comparison of level 4 and the
previous levels, you take the mean of the dependent variable for the those
levels and subtract it from the mean of the dependent variable for level
4: 54.0552 - [(46.4583 + 58 + 48.2) / 3] = 3.169. This result is
statistically significant.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 11.542 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 11.542 | ||

Std. Error | 3.286 | ||

Sig. | .001 | ||

95% Confidence Interval for Difference | Lower Bound | 5.061 | |

Upper Bound | 18.022 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 2 versus group 1 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 1004.785 | 1 | 1004.785 | 12.336 | .001 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -4.029 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -4.029 | ||

Std. Error | 2.602 | ||

Sig. | .123 | ||

95% Confidence Interval for Difference | Lower Bound | -9.161 | |

Upper Bound | 1.103 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 3 versus groups 1 and 2 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 195.254 | 1 | 195.254 | 2.397 | .123 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 3.169 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 3.169 | ||

Std. Error | 1.488 | ||

Sig. | .034 | ||

95% Confidence Interval for Difference | Lower Bound | .235 | |

Upper Bound | 6.104 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 4 versus groups 1 2 and 3 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 369.460 | 1 | 369.460 | 4.536 | .034 |

Error | 15964.717 | 196 | 81.453 |

**Method 2: GLM with /CONTRAST**

As
with the previous examples, we will conduct the analysis above again, this time
using the **/contrast** statement.

glm write by race /contrast (race)=difference /print = test(lmatrix).

These contrasts are interpreted in the same way as the contrasts obtained using Method 1. Again, we see that the first and third contrasts are statistically significant, while the second one is not.

RACE Difference Contrast | |||
---|---|---|---|

Parameter | Level 2 vs. Level 1 | Level 3 vs. Previous | Level 4 vs. Previous |

Intercept | .000 | .000 | .000 |

[RACE=1.00] | -1.000 | -.500 | -.333 |

[RACE=2.00] | 1.000 | -.500 | -.333 |

[RACE=3.00] | .000 | 1.000 | -.333 |

[RACE=4.00] | .000 | .000 | 1.000 |

The default display of this matrix is the transpose of the corresponding L matrix. |

Dependent Variable | |||
---|---|---|---|

RACE Difference Contrast | writing score | ||

Level 2 vs. Level 1 | Contrast Estimate | 11.542 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 11.542 | ||

Std. Error | 3.286 | ||

Sig. | .001 | ||

95% Confidence Interval for Difference | Lower Bound | 5.061 | |

Upper Bound | 18.022 | ||

Level 3 vs. Previous | Contrast Estimate | -4.029 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -4.029 | ||

Std. Error | 2.602 | ||

Sig. | .123 | ||

95% Confidence Interval for Difference | Lower Bound | -9.161 | |

Upper Bound | 1.103 | ||

Level 4 vs. Previous | Contrast Estimate | 3.169 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 3.169 | ||

Std. Error | 1.488 | ||

Sig. | .034 | ||

95% Confidence Interval for Difference | Lower Bound | .235 | |

Upper Bound | 6.104 |

**Method 3: Regression**

The regression coding for difference effect coding is shown below. For the first comparison, where the
first and second level are compared, **x1** is coded -1/2 (-.5) and 1/2
(.5) and the rest 0. For the second comparison, the values of **x2**
are coded -1/3 (-.333) then -1/3 (-.333) then 2/3 (.666) and then 0.
Finally, for the 3rd comparison, the values of **x3** are coded -1/4 -1/4
-/14 and then 3/4.

DIFFERENCE regression coding

New variable 1 (x1) | New variable 2 (x2) | New variable 3 (x3) | |

Level 2 v. Level 1 | Level 3 v. Previous | Level 4 v. Previous | |

1 (Hispanic) | -.5 | -.333 | -.25 |

2 (Asian) | .5 | -.333 | -.25 |

3 (African American) | 0 | .666 | -.25 |

4 (white) | 0 | 0 | .75 |

Below we illustrate how to create **x1** **x2** and **x3** and enter
these new variables into the regression model using the **regression**
command.

if race = 1 x1 = -.5. if race = 2 x1 = .5. if any(race,3,4) x1 = 0. if any(race,1,2) x2 = -.333. if race = 3 x2 = .667. if race = 4 x2 = 0. if any(race,1,2,3) x3 = -.25. if race = 4 x3 = .75. execute. regression /dep write /method = enter x1 x2 x3.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.679 | .982 | 52.616 | .000 | |

X1 | 11.542 | 3.286 | .252 | 3.512 | .001 | |

X2 | -4.029 | 2.602 | -.108 | -1.548 | .123 | |

X3 | 3.168 | 1.488 | .150 | 2.129 | .035 | |

a Dependent Variable: writing score |

In the above examples, both the
regression coefficient for x1 and the contrast estimate for c1
would be the mean of ** write** for level 1 (Hispanic) minus the mean of
** write**
for level 2 (Asian). Likewise, the
regression coefficient for x2 and the contrast estimate for c2
would be the mean of ** write** for levels 1 and 2 combined minus the mean of
** write**
for level 3. Finally, the
regression coefficient for x3 and the contrast estimate for c3
would be the mean of ** write** for levels 1, 2 and 3 combined minus the mean of
** write**
for level 4.

## HELMERT EFFECT CODING

Helmert coding is just the opposite of difference coding: instead of
comparing each level of categorical variable to the mean of the previous level,
it is compared to the mean of the subsequent levels. Hence, the first
contrast compares the mean of
the dependent variable for level 1 of ** race** with the mean of all of the subsequent levels of
**
race** (levels 2, 3, and 4), the second contrast compares the mean of
the dependent variable for level 2 of ** race** with the mean of all of the subsequent levels of
**
race** (levels 3, and 4), and the third contrast compares the mean of
the dependent variable for level 3 of ** race** with the mean of all of the subsequent levels of
**
race** (level 4). However, this type of coding is useful in
situations where the levels of the categorical variable are ordered say, from
lowest to highest, or smallest to largest, etc.

For contrast coding, we see that the first comparison comparing level 1 with levels 2, 3 and 4 is coded 1, -.333, -.333 and -.333, reflecting the comparison of level 1 versus all other levels. The second comparison is coded 0, 1, -.5 and -.5, reflecting that it compares level 2 with levels 3 and 4. The third comparison is coded 0, 0, 1 and -1, reflecting that level 3 is compared to level 4.

Method 1: GLM with /LMATRIX

HELMERT contrast coding

Level of race | New variable 1 (c1) | New variable 2 (c2) | New variable 3 (c3) |

Level 1 v. Later | Level 2 v. Later | Level 3 v. Later | |

1 (Hispanic) | 1 | 0 | 0 |

2 (Asian) | -.333 | 1 | 0 |

3 (African American) | -.333 | -.5 | 1 |

4 (white) | -.333 | -.5 | -1 |

Below we illustrate how to form these comparisons using the **GLM**
command with **/lmatrix**.
Note the use of fractions on the first **/lmatrix** statement. As mentioned above, you need to use numbers that sum to zero, such as
1/3 + 1/3 + 1/3 - 1. You cannot use .333 instead of 1/3: SPSS will
give an error message and fail to calculate the contrast coefficient. The
problem is that .333 + .333 + .333 - 1 is not sufficiently close to
zero.

glm write by race /lmatrix "level 1 versus levels 2 3 and 4" race 1 -1/3 -1/3 -1/3. /lmatrix "level 2 versus levels 3 and 4" race 0 1 -.5 -.5. /lmatrix "level 3 versus level 4" race 0 0 1 -1.

The contrast estimate for the comparison between level 1 and the remaining
levels is calculated by taking the mean of the dependent variable for level 1
and subtracting the
mean of the dependent variable for levels 2, 3 and 4: 46.4583 - [(58 + 48.2 + 54.0552) / 3] =
-6.960, which is statistically significant. This means that the mean of **
write** for level 1 of ** race** is statistically significantly different from the mean
of ** write** for levels 2 through 4. As noted above, this comparison probably
is not meaningful because the variable ** race** is nominal. This type of
comparison would be more meaningful if the categorical variable was
ordinal. To calculate the contrast coefficient for the comparison between
level 2 and the later levels, you subtract the mean of the dependent variable
for levels 3 and 4 from the mean of the dependent variable for level 2: 58
- [(48.2 + 54.0552) / 2] = -11.250, which is statistically significant. The contrast estimate for the comparison between level 3 and level 4 is the
difference between the mean of the dependent variable for the two levels: 48.2 - 54.0552 = -5.855, which is also statistically significant.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -6.960 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -6.960 | ||

Std. Error | 2.175 | ||

Sig. | .002 | ||

95% Confidence Interval for Difference | Lower Bound | -11.250 | |

Upper Bound | -2.670 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 1 versus groups 2 3 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 833.927 | 1 | 833.927 | 10.238 | .002 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 6.872 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 6.872 | ||

Std. Error | 2.926 | ||

Sig. | .020 | ||

95% Confidence Interval for Difference | Lower Bound | 1.101 | |

Upper Bound | 12.644 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 2 versus groups 3 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 449.240 | 1 | 449.240 | 5.515 | .020 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 3 versus group 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 602.550 | 1 | 602.550 | 7.398 | .007 |

Error | 15964.717 | 196 | 81.453 |

**Method 2: GLM with /CONTRAST**

As
with the previous examples, we will conduct the analysis above again, this time
using the **/contrast** statement.

glm write by race /contrast (race)=helmert /print = test(lmatrix).

This output shows the three comparisons: the mean of **write** for level 1
of **race** to the mean of **write** for the other three levels (called
"later" in this output), the mean of **write** for level 2 of **race**
to the mean of **write** for the other two levels, etc. Again, all
three comparisons are statistically significant.

RACE Helmert Contrast | |||
---|---|---|---|

Parameter | Level 1 vs. Later | Level 2 vs. Later | Level 3 vs. Level 4 |

Intercept | .000 | .000 | .000 |

[RACE=1.00] | 1.000 | .000 | .000 |

[RACE=2.00] | -.333 | 1.000 | .000 |

[RACE=3.00] | -.333 | -.500 | 1.000 |

[RACE=4.00] | -.333 | -.500 | -1.000 |

The default display of this matrix is the transpose of the corresponding L matrix. |

Dependent Variable | |||
---|---|---|---|

RACE Helmert Contrast | writing score | ||

Level 1 vs. Later | Contrast Estimate | -6.960 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -6.960 | ||

Std. Error | 2.175 | ||

Sig. | .002 | ||

95% Confidence Interval for Difference | Lower Bound | -11.250 | |

Upper Bound | -2.670 | ||

Level 2 vs. Later | Contrast Estimate | 6.872 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 6.872 | ||

Std. Error | 2.926 | ||

Sig. | .020 | ||

95% Confidence Interval for Difference | Lower Bound | 1.101 | |

Upper Bound | 12.644 | ||

Level 3 vs. Level 4 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 |

**Method 3: Regression**

Below we see an example of regression coding, and you can see that the coding is simply the mirror image of the difference coding we saw above. For the first comparison (comparing 1 with 2, 3 and 4) the codes are 3/4 and -1/4 -1/4 -1/4. The second comparison compares levels 2 with 3 and 4 and is coded 0 2/3 -1/3 -1/3. The third comparison compares levels 3 and 4 and is coded 0 0 1/2 -1/2.

HELMERT regression coding

Level of race | New variable 1 (x1) | New variable 2 (x2) | New variable 3 (x3) |

Level 1 v. Later | Level 2 v. Later | Level 3 v. Later | |

1 (Hispanic) | .75 | 0 | 0 |

2 (Asian) | -.25 | .666 | 0 |

3 (African American) | -.25 | -.333 | .5 |

4 (white) | -.25 | -.333 | -.5 |

**x1** **x2** and **x3** and enter
these new variables into the regression model using the **regression**
command.

if race = 1 x1 = .75. if any(race,2,3,4) x1 = -.25. if race = 1 x2 = 0. if race = 2 x2 = .667. if any(race,3,4) x2 = -.333. if any(race,1,2) x3 = 0. if race = 3 x3 = .5. if race = 4 x3 = -.5. execute. regression /dep write /method = enter x1 x2 x3.

In the above examples, both the
regression coefficient for x1 and the contrast estimate for c1
would be the mean of ** write** for level 1 (Hispanic) versus all subsequent
levels (levels 2, 3 and 4). Likewise, the
regression coefficient for x2 and the contrast estimate for c2
would be the mean of ** write** for level 2 minus the mean of ** write**
for levels 3 and 4. Finally, the
regression coefficient for x3 and the contrast estimate for c3
would be the mean of ** write** for level 3 minus the mean of ** write**
for level 4.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.677 | .982 | 52.635 | .000 | |

X1 | -6.958 | 2.175 | -.239 | -3.199 | .002 | |

X2 | 6.872 | 2.926 | .177 | 2.348 | .020 | |

X3 | -5.855 | 2.153 | -.204 | -2.720 | .007 | |

a Dependent Variable: writing score |

## ORTHOGONAL POLYNOMIAL CODING

Orthogonal polynomial coding is a form of trend analysis in that it is looking for the linear, quadratic and cubic trends in the categorical variable. This type of coding system should be used only with an ordinal variable in which the levels are equally spaced. Examples of such a variable might be income or education. The table below shows the contrast coefficients for the linear, quadratic and cubic trends for 4 groups. These could be obtained from most statistics books on linear models.

POLYNOMIAL

Level of race | Linear (x1) | Quadratic (x2) | Cubic (x3) |

1 (Hispanic) | -.671 | .5 | -.224 |

2 (Asian) | -.224 | -.5 | .671 |

3 (African American) | .224 | -.5 | -.671 |

4 (white) | .671 | .5 | .224 |

**Method 1: GLM with /LMATRIX**

glm write by race /lmatrix "linear" race -.671 -.224 .224 .671 /lmatrix "quadratic" race .5 -.5 -.5 .5 /lmatrix "cubic" race -.224 .671 -.671 .224.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 2.902 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 2.902 | ||

Std. Error | 1.535 | ||

Sig. | .060 | ||

95% Confidence Interval for Difference | Lower Bound | -.125 | |

Upper Bound | 5.930 | ||

a Based on the user-specified contrast coefficients (L') matrix: linear |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 291.104 | 1 | 291.104 | 3.574 | .060 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -2.843 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -2.843 | ||

Std. Error | 1.964 | ||

Sig. | .149 | ||

95% Confidence Interval for Difference | Lower Bound | -6.717 | |

Upper Bound | 1.031 | ||

a Based on the user-specified contrast coefficients (L') matrix: quadratic |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 170.665 | 1 | 170.665 | 2.095 | .149 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 8.277 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 8.277 | ||

Std. Error | 2.316 | ||

Sig. | .000 | ||

95% Confidence Interval for Difference | Lower Bound | 3.709 | |

Upper Bound | 12.846 | ||

a Based on the user-specified contrast coefficients (L') matrix: cubic |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 1040.029 | 1 | 1040.029 | 12.769 | .000 |

Error | 15964.717 | 196 | 81.453 |

To calculate the contrast estimates for these comparisons, you need to
multiply the code used in the new variable by the mean for the dependent
variable for each level of the categorical variable, and then sum the
values. For example, the code used in x1 for level 1 of ** race** is -.671 and
the mean of ** write** for level 1 is 46.4583. Hence, you would multiple -.671
and 46.4583 and add that to the product of the code for level 2 of x1 and its
mean, and so on. To obtain the contrast estimate for the linear contrast,
you would do the following: -.671*46.4583 + -.224*58 + .224*48.2 +
.671*54.0552 = 2.905 (with rounding error). This result is not
statistically significant at the .05 alpha level, but it is close. The
quadratic component is also not statistically significant, but the cubic one
is. This suggests that, if the mean of the dependent variable was plotted
against **race**, the line would tend to have two bends. As noted earlier,
this type of coding system does not make much sense with a nominal variable such
as **race**.

Method 2: GLM with /CONTRAST

glm write by race /contrast (race)=polynomial /print = test(lmatrix).

RACE Polynomial Contrast(a) | |||
---|---|---|---|

Parameter | Linear | Quadratic | Cubic |

Intercept | .000 | .000 | .000 |

[RACE=1.00] | -.671 | .500 | -.224 |

[RACE=2.00] | -.224 | -.500 | .671 |

[RACE=3.00] | .224 | -.500 | -.671 |

[RACE=4.00] | .671 | .500 | .224 |

The default display of this matrix is the transpose of the corresponding L matrix. | |||

a Metric = 1.000, 2.000, 3.000, 4.000 |

Dependent Variable | |||
---|---|---|---|

RACE Polynomial Contrast(a) | writing score | ||

Linear | Contrast Estimate | 2.905 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 2.905 | ||

Std. Error | 1.534 | ||

Sig. | .060 | ||

95% Confidence Interval for Difference | Lower Bound | -.121 | |

Upper Bound | 5.931 | ||

Quadratic | Contrast Estimate | -2.843 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -2.843 | ||

Std. Error | 1.964 | ||

Sig. | .149 | ||

95% Confidence Interval for Difference | Lower Bound | -6.717 | |

Upper Bound | 1.031 | ||

Cubic | Contrast Estimate | 8.273 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 8.273 | ||

Std. Error | 2.316 | ||

Sig. | .000 | ||

95% Confidence Interval for Difference | Lower Bound | 3.706 | |

Upper Bound | 12.840 | ||

a Metric = 1.000, 2.000, 3.000, 4.000 |

Again, we see that only the cubic effect is statistically significant.
In other words, if the mean of **write** was plotted
against **race**, the line would tend to have two bends. Please confirm
that we mean to say the MEAN of write..........

Method 3: Regression

if race = 1 x1 = -.671. if race = 2 x1 = -.224. if race = 3 x1 = .224. if race = 4 x1 = .671. if race = 1 x2 = .5. if race = 2 x2 = -.5. if race = 3 x2 = -.5. if race = 4 x2 = .5. if race = 1 x3 = -.224. if race = 2 x3 = .671. if race = 3 x3 = -.671. if race = 4 x3 = .224. execute. regression /dep write /method = enter x1 x2 x3.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.678 | .982 | 52.619 | .000 | |

X1 | 2.900 | 1.534 | .142 | 1.890 | .060 | |

X2 | -2.843 | 1.964 | -.109 | -1.448 | .149 | |

X3 | 8.271 | 2.315 | .278 | 3.573 | .000 | |

a Dependent Variable: writing score |

The regression coefficients obtained from this analysis are the same as the
contrast coefficients obtained using the **glm** command with either the **/lmatrix**
or the **/contrast** statements.

## REPEATED EFFECT CODING

In this coding system, the mean of the dependent variable for one level
of the categorical variable is compared to the mean of the dependent variable
for the adjacent level. In our example below, the first comparison
compares the mean of ** write** for level 1 with the mean of ** write ** for level 2 of
**
race** (Hispanics minus Asians). The second comparison compares the mean of
**
write** for level 2 minus level 3, and the third comparison compares the mean of
**
write** for level 3 minus level 4. This type of
coding may be useful with either a nominal or an ordinal
variable.

Method 1: GLM with /LMATRIX

REPEATED contrast coding

Level of race | New variable 1 (c1) | New variable 2 (c2) | New variable 3 (c3) |

Level 1 v. Level 2 | Level 2 v. Level 3 | Level 3 v. Level 4 | |

1 (Hispanic) | 1 | 0 | 0 |

2 (Asian) | -1 | 1 | 0 |

3 (African American) | 0 | -1 | 1 |

4 (white) | 0 | 0 | -1 |

glm write by race /lmatrix "level 1 versus level 2" race 1 -1 0 0 /lmatrix "level 2 versus level 3" race 0 1 -1 0 /lmatrix "level 3 versus level 4" race 0 0 1 -1.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -11.542 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -11.542 | ||

Std. Error | 3.286 | ||

Sig. | .001 | ||

95% Confidence Interval for Difference | Lower Bound | -18.022 | |

Upper Bound | -5.061 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 1 versus group 2 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 1004.785 | 1 | 1004.785 | 12.336 | .001 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 9.800 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 9.800 | ||

Std. Error | 3.388 | ||

Sig. | .004 | ||

95% Confidence Interval for Difference | Lower Bound | 3.119 | |

Upper Bound | 16.481 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 2 versus group 3 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 681.574 | 1 | 681.574 | 8.368 | .004 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 | ||

a Based on the user-specified contrast coefficients (L') matrix: group 3 versus group 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 602.550 | 1 | 602.550 | 7.398 | .007 |

Error | 15964.717 | 196 | 81.453 |

With this coding system, adjacent levels of the categorical variable are
compared. Hence, the mean of the dependent variable at level 1 is compared
to the mean of the dependent variable at level 2: 46.4583 - 58 = -11.542,
which is statistically significant. For the comparison between levels 2
and 3, the calculation of the contrast coefficient would be 58 - 48.2 = 9.8,
which is also statistically significant. Finally, comparing levels 3 and
4, 48.2 - 54.0552 = -5.855, a statistically significant difference. One
would conclude from this that each adjacent level of ** race** is statistically
significantly different.

Method 2: GLM with /CONTRAST

glm write by race /contrast (race)=repeated /print = test(lmatrix).

RACE Repeated Contrast | |||
---|---|---|---|

Parameter | Level 1 vs. Level 2 | Level 2 vs. Level 3 | Level 3 vs. Level 4 |

Intercept | 0 | 0 | 0 |

[RACE=1.00] | 1 | 0 | 0 |

[RACE=2.00] | -1 | 1 | 0 |

[RACE=3.00] | 0 | -1 | 1 |

[RACE=4.00] | 0 | 0 | -1 |

The default display of this matrix is the transpose of the corresponding L matrix. |

Dependent Variable | |||
---|---|---|---|

RACE Repeated Contrast | writing score | ||

Level 1 vs. Level 2 | Contrast Estimate | -11.542 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -11.542 | ||

Std. Error | 3.286 | ||

Sig. | .001 | ||

95% Confidence Interval for Difference | Lower Bound | -18.022 | |

Upper Bound | -5.061 | ||

Level 2 vs. Level 3 | Contrast Estimate | 9.800 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 9.800 | ||

Std. Error | 3.388 | ||

Sig. | .004 | ||

95% Confidence Interval for Difference | Lower Bound | 3.119 | |

Upper Bound | 16.481 | ||

Level 3 vs. Level 4 | Contrast Estimate | -5.855 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -5.855 | ||

Std. Error | 2.153 | ||

Sig. | .007 | ||

95% Confidence Interval for Difference | Lower Bound | -10.101 | |

Upper Bound | -1.610 |

Again, we see that the results are the same as those obtained using the **/lmatrix**
statement: all three comparisons are statistically significant.

Method 3: Regression

For the first
comparison, where the first and second levels are compared, x1 is coded "-3/4" for level 1 and the rest
"-1/4". For the second comparison where level
2 is compared with level 3, x2 is coded "1/2" "1/2" "-1/2"
"-1/2", and for the
third comparison where** **level 3 is compared with level 4, x3 is
coded "1/4" "1/4" "1/4" and "-3/4".

REPEATED regression

Level of race | New variable 1 (x1) | New variable 2 (x2) | New variable 3 (x3) |

Level 1 v. Level 2 | Level 2 v. Level 3 | Level 3 v. Level 4 | |

1 (Hispanic) | .75 | .5 | .25 |

2 (Asian) | -.25 | .5 | .25 |

3 (African American) | -.25 | -.5 | .25 |

4 (white) | -.25 | -.5 | -.75 |

if race = 1 x1 = .75. if any(race,2,3,4) x1 = -.25. if any(race,1,2) x2 = .5. if any(race,3,4) x2 = -.5. if any(race,1,2,3) x3 = .25. if race = 4 x3 = -.75. execute. regression /dep write /method = enter x1 x2 x3.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.678 | .982 | 52.619 | .000 | |

X1 | -11.542 | 3.286 | -.397 | -3.512 | .001 | |

X2 | 9.800 | 3.388 | .394 | 2.893 | .004 | |

X3 | -5.855 | 2.153 | -.277 | -2.720 | .007 | |

a Dependent Variable: writing score |

In the above examples, both the
regression coefficient for x1 and the contrast estimate for c1
would be the mean of ** write** for level 1 (Hispanic) minus the mean of ** write**
for level 2 (Asian). Likewise, the
regression coefficient for x2 and the contrast estimate for c2
would be the mean of ** write** for level 2 (Asian) minus the mean of ** write**
for level 3 (African American), and the
regression coefficient for x3 and the contrast estimate for c3
would be the mean of ** write** for level 3 (African American) minus the mean
of ** write** for level 4 (white).

## SPECIAL USER-DEFINED CODING SYSTEM

SPSS allows users to define their own effect coding systems. Remember when doing this that the values within each contrast must sum to zero. For our example, we will make the following three comparisons:

1) level 1 to level 3,

2) level 2 to levels 1 and
4 and

3) levels 1 and 2 to levels 3 and 4.

**Method 1: GLM with /LMATRIX**

In order to
compare level 1 to level 3, we use the contrast coefficients 1 0 -1 0. To
compare level 2 to levels 1 and 4 we use the contrast coefficients -.5 1 0 -.5
. Finally, to compare levels 1 and 2 with levels 3 and 4 we use the
coefficients .5 .5 -.5 -.5. These coefficients are used in the **/lmatrix**
commands below.

glm write by race /lmatrix "compare level 1 to level 3" race 1 0 -1 0 /lmatrix "compare level 2 to levels 1 and 4" race -.5 1 0 -.5 /lmatrix "compare levels 1 and 2 to levels 3 and 4" race .5 .5 -.5 -.5.

The first "contrast results" table shows the results of comparing group 1 to group 3. The contrast estimate for this comparison is the mean of group 1 minus the mean for group 3, and the significance of this is .525, i.e. not significant. The second "contrast results" output shows the contrast estimate to be 7.743, which is the mean of group 2 minus the mean of group 1 and group 4, and this difference is significant, p=0.008. The final contrast estimate is 1.1 which is the mean of groups 1 and 2 minus the mean of groups 3 and 4, and this contrast is not significant, p=.576.

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | -1.742 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -1.742 | ||

Std. Error | 2.732 | ||

Sig. | .525 | ||

95% Confidence Interval for Difference | Lower Bound | -7.131 | |

Upper Bound | 3.647 | ||

a Based on the user-specified contrast coefficients (L') matrix: compare group 1 to group 3 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 33.092 | 1 | 33.092 | .406 | .525 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 7.743 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 7.743 | ||

Std. Error | 2.897 | ||

Sig. | .008 | ||

95% Confidence Interval for Difference | Lower Bound | 2.030 | |

Upper Bound | 13.457 | ||

a Based on the user-specified contrast coefficients (L') matrix: compare group 2 to groups 1 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 581.833 | 1 | 581.833 | 7.143 | .008 |

Error | 15964.717 | 196 | 81.453 |

Dependent Variable | |||
---|---|---|---|

Contrast | writing score | ||

L1 | Contrast Estimate | 1.102 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 1.102 | ||

Std. Error | 1.964 | ||

Sig. | .576 | ||

95% Confidence Interval for Difference | Lower Bound | -2.772 | |

Upper Bound | 4.975 | ||

a Based on the user-specified contrast coefficients (L') matrix: compare groups 1 and 2 to groups 3 and 4 |

Source | Sum of Squares | df | Mean Square | F | Sig. |
---|---|---|---|---|---|

Contrast | 25.618 | 1 | 25.618 | .315 | .576 |

Error | 15964.717 | 196 | 81.453 |

**Method 2: GLM with /CONTRAST**

When using **glm**
with the **/contrast** statement, you can specify your own contrast
coefficients with the **special** keyword, followed by the contrasts you
would like to test. To compare level 1 to level 3, we use the contrast
coefficients 1 0 -1 0 and then to compare level 2 to levels 1 and 4 we use the
contrast coefficients -.5 1 0 -.5 and finally, to compare levels 1 and 2 with
levels 3 and 4 we use the coefficients .5 .5 -.5 -.5.

glm write by race /contrast (race)=special(1 0 -1 0, -.5 1 0 -.5, .5 .5 -.5 -.5) /print = test(lmatrix).

As you can see, the **glm** results below correspond to the **glm**
results above using method 1.

RACE Special Contrast | |||
---|---|---|---|

Parameter | L1 | L2 | L3 |

Intercept | .000 | .000 | .000 |

[RACE=1.00] | 1.000 | -.500 | .500 |

[RACE=2.00] | .000 | 1.000 | .500 |

[RACE=3.00] | -1.000 | .000 | -.500 |

[RACE=4.00] | .000 | -.500 | -.500 |

The default display of this matrix is the transpose of the corresponding L matrix. |

Dependent Variable | |||
---|---|---|---|

RACE Special Contrast | writing score | ||

L1 | Contrast Estimate | -1.742 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | -1.742 | ||

Std. Error | 2.732 | ||

Sig. | .525 | ||

95% Confidence Interval for Difference | Lower Bound | -7.131 | |

Upper Bound | 3.647 | ||

L2 | Contrast Estimate | 7.743 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 7.743 | ||

Std. Error | 2.897 | ||

Sig. | .008 | ||

95% Confidence Interval for Difference | Lower Bound | 2.030 | |

Upper Bound | 13.457 | ||

L3 | Contrast Estimate | 1.102 | |

Hypothesized Value | 0 | ||

Difference (Estimate - Hypothesized) | 1.102 | ||

Std. Error | 1.964 | ||

Sig. | .576 | ||

95% Confidence Interval for Difference | Lower Bound | -2.772 | |

Upper Bound | 4.975 |

**
Method 3: Regression**

As in the prior examples, we will make the following three comparisons:

1) level 1 to level 3,

2) level 2 to levels 1 and 4 and

3) levels 1 and 2 to levels 3 and 4.

For methods 1 and 2 it was quite easy to translate the comparisons we wanted to make
into contrast codings, but it is not as easy to translate the comparisons we
want into a regression coding scheme. If we know the contrast coding system, then
we can convert that into a regression coding system using the SPSS
program shown below. As you can see, we place the three contrast codings we want into the matrix **c**
and then perform a set of matrix operations on **c,** yielding the matrix **x**.
We then display **x** using the **print** command.

matrix. compute c = { 1, -.5, .5 ; 0, 1, .5 ; -1, 0, -.5 ; 0, -.5, -.5 }. compute x = c*inv( t(c)*c ). print x. end matrix.

Below we see the output from this program showing the regression coding scheme we would use.

X -.500000000 -1.000000000 1.500000000 .500000000 1.000000000 -.500000000 -1.500000000 -1.000000000 1.500000000 1.500000000 1.000000000 -2.500000000

This converted the contrast coding into the regression
coding that we need for running this analysis with the **regress**
command. Below, we use **if** command to create **x1 x2** and **x3**
according to the coding shown above and then enter them into the regression
analysis.

if race = 1 x1 = -0.5. if race = 2 x1 = .5. if race = 3 x1 = -1.5. if race = 4 x1 = 1.5.

if race = 1 x2 = -1. if race = 2 x2 = 1. if race = 3 x2 = -1. if race = 4 x2 = 1. if race = 1 x3 = 1.5. if race = 2 x3 = -.5. if race = 3 x3 = 1.5. if race = 4 x3 =-2.5. execute.

regression /dep write /method = enter x1 x2 x3.

Here is a shortcut to save typing all of the ** compute** commands. This
assumes that ** race** is coded 1 2 3 4.

get file = "d:spsshsb2.sav". sort cases by race. save outfile = "c:temprace.sav". matrix. compute c = { 1, -.5, .5 ; 0, 1, .5 ; -1, 0, -.5 ; 0, -.5, -.5 }. compute x = c*inv( t(c)*c ). save x /outfile=* /var=x1 x2 x3 end matrix. compute race = $CASENUM. execute. match files /table=* /file="c:temprace.sav" /by race. execute.

regression /dep write /method = enter x1 x2 x3.

The first comparison of the mean of the dependent variable for level 1 to level 3 of the categorical variable was not statistically significant, while the comparison of the mean of the dependent variable for level 2 to that of levels 1 and 4 was. The comparison of the mean of the dependent variable for levels 1 and 2 to that of levels 3 and 4 was not statistically significant.

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | 51.678 | .982 | 52.619 | .000 | |

X1 | -1.742 | 2.732 | -.192 | -.637 | .525 | |

X2 | 7.743 | 2.897 | .679 | 2.673 | .008 | |

X3 | 1.102 | 1.964 | .194 | .561 | .576 | |

a Dependent Variable: writing score |

MOVE TO ANNOTATED OUTPUT

For the example using
difference coding, we also include the **parameter** option on the **/print**
statement. This causes SPSS to print out the coding system used for the
regression analysis as well as the results of the regression analysis.
This illustrates how the two coding systems are different and shows that the
results of the regression are the same as when dummy coding is used.

additional notes

Whether you manually code the variables or have SPSS do it for you, there is one "rule" to creating effects codes: the values within each newly created variable must all sum to zero. Which level of the categorical variable is assigned a positive or negative value is not terribly important: 0 1 -1 0 is the same as 0 -1 1 0 in that both of these codings compare the second and the third levels of the variable; however, the sign of the coefficient would change.

When doing any sort of effect coding, there are three approaches to the coding
of the variables. The first approach is to use **glm** with **/lmatrix**
statements. You will need to use one **/lmatrix** statement for
each contrast. Hence, in our example, because we have a four-level
categorical variable, we will need to use three **/lmatrix** statements
(all of which are part of the same **glm** command). The second approach is to use
**glm** and include a **/contrast () =** statement, placing the name of the categorical
variable in the parentheses and the name of the contrast to be used after the
equal sign. The third approach is to manually compute them, which is
shown in Method 3. You create a new variable, setting it equal to one of
the values that it will assume, and then use **if** commands to change
the value according to the values in the original (categorical) variable.
If you use this approach, you can use either **regression** or **glm**.
Below are examples of all three approaches. In Method 1, we include a **/print** statement with
the **test(lmatrix)** option so that SPSS
prints out the coding system used for the contrasts. In
the interest of conserving space, we have included only the relevant output. We have interspersed explanations into the
output to aid in the interpretation of the results. We have developed a separate
page with the full output annotated (insert link).........