Coding Systems for Categorical Variables in Regression Analysis

Categorical variables require special attention in regression analysis because, unlike dichotomous or continuous variables, they cannot by entered into the regression equation just as they are. Instead, they need to be recoded into a series of variables which can then be entered into the regression model. There are a variety of coding systems that can be used when recoding categorical variables, and which one you select depends on the comparisons that you want to make. For example, you may want to compare each level of the categorical variable to the lowest level (or any given level). In that case you would use a system called simple coding. Or you may want to compare each level to the next higher level, in which case you would want to use repeated coding. We will discuss two general types of coding and when to use them: dummy coding and effect coding.

The examples in this page will use dataset called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2-2.sav and we will focus on the categorical variable race, which has four levels (1 = Hispanic, 2 = Asian, 3 = African American and 4 = white) and we will use write as our dependent variable. Although our example uses a variable with four levels, these coding systems work with variables that have more categories or fewer categories. No matter which coding system you select, you will always have one fewer recoded variables than levels of the original variable. In our example, our categorical variable has four levels. We will therefore have three new variables. (A variable corresponding to the final level of the categorical variables would be redundant and therefore unnecessary.)

DUMMY CODING

Perhaps the simplest and perhaps most common coding system is called dummy coding. It is a way to make the categorical variable into a series of dichotomous variables (variables that can have a value of zero or one only.) For all but one of the levels of the categorical variable, a new variable will be created that has a value of one for each observation at that level and zero for all others. In our example using the variable race, the first new variable (x1) will have a value of one for each observation in which race is Hispanic, and zero for all other observations. Likewise, we create x2 to be 1 when the person is Asian, and 0 otherwise, and x3 is 1 when the person is African American, and 0 otherwise. The level of the categorical variable that is coded as zero in all of the new variables is the reference level, or the level to which all of the other levels are compared. In our example, white is the reference level. You can select any level of the categorical variable as the reference level.

DUMMY CODING

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
1 (Hispanic)	1	0	0
2 (Asian)	0	1	0
3 (African American)	0	0	1
4 (white)	0	0	0

After creating the new variables, they are entered into the regression (the original variable is not entered), so we would enter x1 x2 and x3 instead of entering race into our regression equation and the regression output will include coefficients for each of these variables. The coefficient for x1 is the mean of the dependent variable for group 1 minus the mean of the dependent variable for the omitted group. In our example, the coefficient for x1 would be the mean of write for the Hispanic group minus the mean of write for the white group. Likewise, the coefficient for x2 would be the mean of write for the Asian group minus the mean of write for the white group, and the coefficient for x3 would be the mean of write for the African American group minus the mean of write for the white group.

EFFECT CODING

Other coding systems use more values than just zero and one, and therefore allow you to make other types of comparisons. Unlike dummy coding, effect coding allows you to assign different weights the various levels of the categorical variable. While the “rule” in dummy coding is that only values of zero and one are valid, the “rule” in effect coding is that all of the values in any new variable must sum to zero. Which level is assigned a positive or negative value is not very important: 0 1 -1 0 is the same as 0 -1 1 0 in that both of these codings compare the second and the third levels of the variable, however the sign of the coefficient would change.

Another point to consider is that while you can use dummy coding with any type of categorical variable, some forms of effect coding make more sense with ordinal categorical variables than with nominal categorical variables. For our example we use the variable race, which is a nominal categorical variable. Because dummy coding compares the mean of the dependent variable for each level of the categorical variable to the mean of the dependent variable at for the reference group, it makes sense with a nominal variable. However, it may not make as much sense to use a coding scheme that tests the linear effect of race. As we describe each type of coding system, we note those coding systems with which it does not make as much sense to use a nominal variable.

Within SPSS there are two general commands that you can use for analyzing data with a continuous dependent variable and one or more categorical predictors, the regression command and the glm command (that replaced the manova command, not discussed in this page). If using the regression command, you would create one fewer new variables than there are levels in your categorical variable and use these new variables as predictors in your regression model. The values for these new variables will depend on how many levels are in your categorical variable and the coding system you choose. From this point we will refer to the coding scheme as used in the regression command as regression coding. Another method for analyzing categorical data would be to use the glm command and then you could use the lmatrix or the contrast subcommands to perform comparisons among the groups. We will refer to this coding scheme as contrast coding. So, if you are using the regression command, be sure to choose the regression coding scheme and if you are using the glm command be sure to choose the contrast coding scheme.

Below is a table listing various types of contrasts and the comparison that they make.

Name of contrast	Comparison made
Simple	Compares each level of a variable to the first level (or whichever level is specified)
Deviation	Compares deviations from the grand mean
Difference	Compares levels of a variable with the mean of the previous levels of the variable; also known as reverse-Helmert; this is an orthogonal contrast
Helmert	Compare levels of a variable with the mean of the subsequent levels of the variable; this is an orthogonal contrast
Polynomial	Orthogonal polynomial contrasts; the first degree of freedom contains the linear effect across the levels of the factor, the second degree of freedom contains the quadratic effect, and so on. In a balanced design, polynomial contrasts are orthogonal.
Repeated	Compare adjacent levels of a variable; this is not an orthogonal contrast
Special	User-defined contrast

SIMPLE EFFECT CODING

The results of simple effect coding is very similar to dummy coding in that each group is compared to the reference group. In the example below, group 4 is the reference group and the first comparison compares group 1 to group 4, the second comparison compares group 2 to group 4, and the third comparison compares group 3 to group 4.

The regression coding is a bit more complex than simple dummy coding. In our example below, group 4 is the reference group and x1 compares group 1 to group 4, x2 compares group 2 to group 4, and x3 compares group 3 to group 4. Note that the coding is a bit more tricky than simple dummy coding. For x1 the coding is 3/4 (.75) for group 1, and -1/4 (-.25) for all other groups. Likewise, for x2 the coding is 3/4 (.75) for group 2, and -1/4 (-.25) for all other groups, and for x3 the coding is 3/4 (.75) for group 3, and -1/4 (-.25) for all other groups. Note that each new variable must sum to 0.

SIMPLE regression coding

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
1 (Hispanic)	.75	-.25	-.25
2 (Asian)	-.25	.75	-.25
3 (African American)	-.25	-.25	.75
4 (white)	-.25	-.25	-.25

The contrast coding, see below, is more straightforward. It also follows the rule that for effect coding that the values in each new variable sum to zero. The first contrast compares group 1 to group 4, and group 1 is coded “1” and group 4 is coded “-1”. Likewise, the second contrast compares group 2 to group 4 by coding group 2 “1” and group 4 “-1”. As you can see with contrast coding, you can discern the meaning of the comparisons simply by inspecting the contrast coefficients. For example, looking at the contrast coefficients for c3 you can see that this compares group 3 to group 4.

SIMPLE effect contrast coding

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
1 (Hispanic)	1	0	0
2 (Asian)	0	1	0
3 (African American)	0	0	1
4 (white)	-1	-1	-1

DEVIATION EFFECT CODING

This coding system compares the mean of the dependent variable for a given level to the grand mean of the dependent variable. In our example below, the first comparison compares level 1 (Hispanic) to all 3 other groups, the second comparison compares level 2 (Asian) to the 3 other groups, and the third comparison compares level 3 (African American) to the 3 other groups.

As you see in the example below, the regression coding is accomplished by assigning “1” to group 1 for the first comparison (since group 1 is the group to be compared to all others), a “1” to group 2 for the second comparison (since group 2 is to be compared to all others), and “1” to group 3 for the third comparison (since group 3 is to be compared to all others). Note that a “-1” is assigned to group 4 for all 3 comparisons (since it is the group that is never compared to the other groups) and all other values are assigned a 0. This regression coding scheme yields the comparisons described above.

DEVIATION regression coding

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
	Level 1 v. Mean	Level 2 v. Mean	Level 3 v. Mean
1 (Hispanic)	1	0	0
2 (Asian)	0	1	0
3 (African American)	0	0	1
4 (white)	-1	-1	-1

As you can see, contrast coding is much simpler. The first comparison that compares group 1 to groups 2,3,4 assigns 3/4 (.75) to group 1 and -1/4 (.25) to groups 2,3,4. Likewise, the second comparison that compares group 2 to groups 1,3,4 assigns 3/4 (.75) to group 2 and -1/4 (.25) to groups 1,3,4 and so forth for the third comparison. Note that you could substitute 3 for 3/4 and 1 for 1/4 and you would get the same test of significance, but the contrast coefficient would be different.

DEVIATION contrast coding

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
	Level 1 v. Mean	Level 2 v. Mean	Level 3 v. Mean
1 (Hispanic)	.75	-.25	-.25
2 (Asian)	-.25	.75	-.25
3 (African American)	-.25	-.25	.75
4 (white)	-.25	-.25	-.25

In the above examples, both the regression coefficient for x1 and the contrast estimate for c1 would be the mean of write for level 1 (Hispanic) minus the mean of write for levels 2,3 and 4 combined. Likewise, the regression coefficient for x2 and the contrast estimate for c2 would be the mean of write for level 2 (Asian) minus the mean of write for levels 1, 3, and 4 combined.

DIFFERENCE CODING

In this coding system, each level is compared to the mean of the previous levels. In our example, the first comparison codes the comparison of the mean of the dependent variable for level 1 of race to the mean of the dependent variable for level 2 of race. The second comparison compares the mean of the dependent variable for both levels 1 and 2 of race with the mean of the dependent variable for level 3 of race, and the third comparison compares the mean of the dependent variable for levels 1,2 and 3 of race with the 4th level of race. Clearly, this coding system does not make much sense with our example of race because it is a nominal variable. However, this system is useful when the levels of the categorical variable are ordered in a meaningful way. For example, if we had a categorical variable in which work-related stress was coded as low, medium or high, then comparing the means of the previous levels of the variable would make more sense.

Below we see an example of regression coding. For the first comparison, where the first and second level are compared, x1 is coded -1/2 (-.5) and 1/2 (.5) and the rest 0. For the second comparison, the values of x2 are coded -1/3 (-.333) then -1/3 (-.333) then 2/3 (.666) and then 0. Finally, for the 3rd comparison, the values of x3 are coded -1/4 -1/4 -/14 and then 3/4.

DIFFERENCE regression coding

	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
	Level 2 v. Level 1	Level 3 v. Previous	Level 4 v. Previous
1 (Hispanic)	-.5	-.333	-.25
2 (Asian)	.5	-.333	-.25
3 (African American)	0	.666	-.25
4 (white)	0	0	.75

For contrast coding, we see that the first comparison comparing groups 1 and 2 are coded -1 and 1 to compare these groups, and 0 otherwise. The second comparison comparing groups 1,2 with group 3 are coded -.5 -.5 1 and 0, and the last comparison comparing groups 1,2,3 with group 4 are coded -.333 -.333 -.333 and 1.

DIFFERENCE contrast coding

	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
	Level 2 v. Level 1	Level 3 v. Previous	Level 4 v. Previous
1 (Hispanic)	-1	-.5	-.333
2 (Asian)	1	-.5	-.333
3 (African American)	0	1	-.333
4 (white)	0	0	1

In the above examples, both the regression coefficient for x1 and the contrast estimate for c1 would be the mean of write for level 1 (Hispanic) minus the mean of write for level 2 (Asian). Likewise, the regression coefficient for x2 and the contrast estimate for c2 would be the mean of write for levels 1 and 2 combined minus the mean of write for level 3. Finally, the regression coefficient for x3 and the contrast estimate for c3 would be the mean of write for levels 1, 2 and 3 combined minus the mean of write for level 4.

HELMERT EFFECT CODING

Helmert coding is just the opposite of difference coding: instead of comparing each level of categorical variable to the mean of the previous levels, it is compared to the mean of the subsequent levels. Hence, the first contrast compares the mean of the dependent variable for level 1 of race with the mean of all of the subsequent levels of race (levels 2, 3, and 4), the second contrast compares the mean of the dependent variable for level 2 of race with the mean of all of the subsequent levels of race (levels 3, and 4), and the third contrast compares the mean of the dependent variable for level 3 of race with the mean of all of the subsequent levels of race (level 4). However, this type of coding is useful in situations where the levels of the categorical variable are ordered say, from lowest to highest, or smallest to largest, etc.

Below we see an example of regression coding, and you can see that the coding is simply the mirror image of the difference coding. For the first comparison (comparing 1 with 2, 3, and 4) the codes are 3/4 and -1/4 -1/4 -1/4. The second comparison compares groups 2 with 3 and 4 and is coded 0 2/3 -1/3 -1/3. The third comparison compares levels 3 and 4 and is coded 0 0 1/2 -1/2.

HELMERT regression coding

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
	Level 1 v. Later	Level 2 v. Later	Level 3 v. Later
1 (Hispanic)	.75	0	0
2 (Asian)	-.25	.666	0
3 (African American)	-.25	-.333	.5
4 (white)	-.25	-.333	-.5

For contrast coding, we see that the first comparison comparing group 1 with groups 2, 3 and 4 is coded 1 -.333 -.333 -.333 reflecting the comparison of group 1 versus all other groups. The second comparison is coded 0 1 -.5 -.5 reflecting that it compares group 2 with groups 3 and 4. The 3rd comparison is coded 0 0 1 -1 reflecting that group 3 is compared to group 4.

HELMERT contrast coding

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
	Level 1 v. Later	Level 2 v. Later	Level 3 v. Later
1 (Hispanic)	1	0	0
2 (Asian)	-.333	1	0
3 (African American)	-.333	-.5	1
4 (white)	-.333	-.5	-1

In the above examples, both the regression coefficient for x1 and the contrast estimate for c1 would be the mean of write for level 1 (Hispanic) vs all subsequent levels (levels 2, 3 and 4). Likewise, the regression coefficient for x2 and the contrast estimate for c2 would be the mean of write for level 2 minus the mean of write for levels 3 and 4. Finally, the regression coefficient for x3 and the contrast estimate for c3 would be the mean of write for level 3 minus the mean of write for level 4.

ORTHOGONAL POLYNOMIAL CODING

Orthogonal polynomial coding is a form trend analysis in that it is looking for the linear, quadratic and cubic trends in the categorical variable. This type of coding system should be used only with an ordinal variable in which the levels are equally spaced. An example of such a variable might be income, or education.

MORE HERE.

POLYNOMIAL

Level of race	Linear (x1)	Quadratic (x2)	Cubic (x3)
1 (Hispanic)	-.671	.5	-.224
2 (Asian)	-.224	-.5	.671
3 (African American)	.224	-.5	-.671
4 (white)	.671	.5	.224

REPEATED EFFECT CODING

In this coding system, the mean of the dependent variable for one level of the categorical variable is compared to the mean of the dependent variable for the adjacent level. In our example below, the first comparison compares the the mean of write for level 1 with the mean of write for level 2 of race (Hispanic minus Asian). The second comparison compares the mean of write for level 2 minus level 3, and the third comparison compares the mean of write for level 3 minus level 4. This type of coding may be useful with either a nominal or an ordinal variable.

Below we see an example of regression coding. For the first comparison, where the first and second level are compared, x1 is coded -3/4 for level 1 and the rest -1/4. For the second comparison where level 2 is compared with level 3, x2 is coded 1/2 1/2 -1/2 -1/2, and for the third comparison where level 3 is compared with level 4, x3 is coded 1/4 1/4 1/4 and -3/4.

REPEATED regression

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
	Level 1 v. Level 2	Level 2 v. Level 3	Level 3 v. Level 4
1 (Hispanic)	.75	.5	.25
2 (Asian)	-.25	.5	.25
3 (African American)	-.25	-.5	.25
4 (white)	-.25	-.5	-.75

For contrast coding, the coding more naturally reflects the comparisons being made. The first comparison is coded 1 -1 0 0 reflecting that group 1 is compared to group 2. The second comparison is coded 0 1 -1 0 reflecting that group 2 is compared to group 3, and the third comparison is coded 0 0 1 -1 reflecting that group 3 is compared with group 4.

REPEATED contrast coding

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
	Level 1 v. Level 2	Level 2 v. Level 3	Level 3 v. Level 4
1 (Hispanic)	1	0	0
2 (Asian)	-1	1	0
3 (African American)	0	-1	1
4 (white)	0	0	-1

In the above examples, both the regression coefficient for x1 and the contrast estimate for c1 would be the mean of write for level 1 (Hispanic) minus the mean of write for level 2 (Asian). Likewise, the regression coefficient for x2 and the contrast estimate for c2 would be the mean of write for level 2 (Asian) minus the mean of write for level 3 (African American), and the regression coefficient for x3 and the contrast estimate for c3 would be the mean of write for level 3 (African American) minus the mean of write for level 4 (white).

SYNTAX

For most coding systems, there are two ways to code categorical variables: manually coding them and having SPSS code them for you. There are benefits and drawbacks to both approaches. The benefit of manually coding variables is that you have absolute control over how they are coded. The drawback to this approach is that it is relatively easy to make an error in writing the syntax. An error may be difficult to find, particularly if the error is a logic error instead of a syntax error. (SPSS will give you an error message in the output window if there is a syntax error, but not if there is a logical error.) One way to avoid having an error in your syntax is by allowing SPSS to code the varable(s) for you, but in doing so, you may have to give up some control over how the codes are assigned. Also, SPSS will not create certain kinds of codes for you, most notably dummy codes. Below we show two ways to create dummy codes and three ways to create each type of effect coding for our example using the four-level categorical variable race.

Before considering any analyses, let’s look at the mean of the dependent variable, write, for each level of race. This will help in interpreting the output from the analyses.

means tables = write by race.

**Case Processing Summary**
	Cases
	Included		Excluded		Total
	N	Percent	N	Percent	N	Percent
writing score * RACE	200	100.0%	0	.0%	200	100.0%

**Report**writing score
RACE	Mean	N
hispanic	46.4583	24
asian	58.0000	11
african-amer	48.2000	20
white	54.0552	145
Total	52.7750	200

DUMMY CODING

In Method 1, we create a new variable (i.e., x1) that is set equal to zero. Then we change the value of this new variable to equal one if the level in the original (categorical) variable is one. We repeat this process for each new variable that we need to create. In Method 2, we use a “do-loop” to generate the new variables, which can be useful if your categorical variable has a large number of levels.

* Method 1 for creating dummy variables.

compute x1 = 0.
if race = 1 x1 = 1.
compute x2 = 0.
if race = 1 x2 = 1.
compute x3 = 0.
if race = 1 x3 = 1.
execute.

* Method 2 for creating dummy variables.

do repeat A=x1 x2 x3
 /B=1 2 3.
compute A=(x=B).
end repeat.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	X3, X2, X1(a)	.	Enter
a All requested variables entered.				b Dependent Variable: writing score

The table above shows which variables were entered into the regression equation. It also indicates that the method used was “enter”, as opposed to other possible methods that could have been specified, such as backward, forward or stepwise. The table also indicates that all of the variables listed on the /method= statement were entered into the regression equation.

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.327(a)	.107	.093	9.02511
a Predictors: (Constant), X3, X2, X1

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	1914.158	3	638.053	7.833	.000(a)
	Residual	15964.717	196	81.453
	Total	17878.875	199
a Predictors: (Constant), X3, X2, X1							b Dependent Variable: writing score

The table above entitled “Model Summary” indicates that one model was tested, that 10.7% of the variance in the dependent variable is accounted for by the independent variable, and that 9.3% of the variance of the dependent variable is accounted for by the independent variable when the number of independent variables in the equation is taken into consideration. The standard error of the estimate is also given. The table entitled “ANOVA” gives the sum of squares and the degrees of freedom (in the column labeled “df”) for the regression, the residual and the total (regression plus residual). The mean square is given for the regression and the residual, and the F-value and the associated p-value (in the column labeled Sig.) is displayed. These results indicate that the regression is statistically significant at the .05 alpha level. As you will see, the overall test of race is the same regardless of the coding system used.

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	54.055	.749		72.122	.000
	X1	-7.597	1.989	-.261	-3.820	.000
	X2	3.945	2.823	.095	1.398	.164
	X3	-5.855	2.153	-.186	-2.720	.007
a Dependent Variable: writing score

The table above gives the unstandardized coefficients for the regression equation (in the column labeled B) and the standard error (in the column labeled Std. Error). When using dummy coding, the constant is the mean of the omitted level of the categorical variable. The coefficient for x1 is the difference between the mean of the dependent variable for level 1 of race minus the mean of the dependent variable at level 4 of race (the reference level). Likewise, the coefficient for x2 and x3 is the mean of the dependent variable at that level of race minus the mean of the dependent variable for the reference level. The standardized coefficients are given in the column labeled Beta. The t-values and associated p-values are also given. The statistical significance of the constant is rarely of interest to researchers. The coefficients for x1 and x3 are statistically significant at the .05 (and .01) alpha level, while the coefficient for x2 is not. This indicates that level 1 of race (Hispanic) is significantly different from level 4 (white), and that level 3 (African American) is significantly different from level 4 (white).

(left off here)

EFFECT CODING

When doing any sort of effect coding, there are three approaches to the coding of the variables. The first approach is to manually compute them for use in OLS regression, which is shown in Method 1. You create a new variable, setting it equal to one of the values that it will assume, and then use “if” statements to change the value according to the values in the original (categorical) variable. If you use this approach, you can use either “regression” or “glm”. The second approach is to use “glm” with “/lmatrix” statements. You will need to use one “/lmatrix” statement for each contrast. Hence, in our example, because we have a four-level categorical variable, we will need to use three “/lmatrix” statements (all of which are part of the same “glm” command). The third approach is to use “glm” and include a “/contrast () =” statement, placing the name of the categorical variable in the parentheses and the name of the contrast to be used after the equal sign. Below are examples of all three approaches. In Method 3, we include a “/print” statement with the “test(lmatrix)” option so that SPSS prints out the coding system used for the contrasts. For the example using difference coding, we also include the “parameter” option on the print statement. This causes SPSS to print out the coding system used for the regression analysis as well as the results of the regression analysis. This illustrates how the two coding systems are different and shows that the results of the regression are the same as when dummy coding is used. In the interest of conserving space, we include the output only for the third method of creating the codes. However, the output from the other methods will be very similar and will contain all of the same values for parameter estimates, tests of statistical significance, etc. We have interspersed explanations into the following output. For the other types of coding systems, we omit the output that is the same and only discuss the output that changes as a result of the different coding system used.

SIMPLE EFFECT CODING

Method 1:

if race = 1 x1 = .75.
if any(race,2,3,4) x1 = -.25.

if race = 2 x2 = .75.
if any(race,1,3,4) x2 = -.25.

if race = 3 x3 = .75.
if any(race,1,2,4) x3 = -.25.
execute.

regression
 /dependent = write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "group 1 versus group 4" race 1 0 0 -1
 /lmatrix "group 2 versus group 4" race 0 1 0 -1
 /lmatrix "group 3 versus group 4" race 0 0 1 -1.

< output omitted >

Method 3:

glm
 write by race
 /contrast (race)=simple
 /print = parameter test(lmatrix).

**Between-Subjects Factors**
		Value Label	N
RACE	1.00	hispanic	24
	2.00	asian	11
	3.00	african-amer	20
	4.00	white	145

**Tests of Between-Subjects Effects**Dependent Variable: writing score
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	1914.158(a)	3	638.053	7.833	.000
Intercept	225523.580	1	225523.580	2768.770	.000
RACE	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453
Total	574919.000	200
Corrected Total	17878.875	199
a R Squared = .107 (Adjusted R Squared = .093)

The table above entitled “Between-Subjects Factors” shows the levels of the categorical variable, the value label associated with each level (if any) and the number of observations at each level (in the column N). The table entitled “Tests of Between-Subjects Effects” shows the source, the type III sums of squares, the degrees of freedom (called “df”), the mean square, the F values and the corresponding p-values. The F-value for the corrected model of 7.833 and its p-value of .000 indicate that the overall model is statistically significant. The F- and p-values for race are the same because in this model, we have only one independent variable. If we had more than one independent variable, the F- and p-values for the overall model would be different from those for the independent variables. The F- and p-values for the intercept are also statistically significant, but those are rarely of interest.

Lower Bound

**Parameter Estimates**Dependent Variable: writing score
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	B	Std. Error	t	Sig.	Upper Bound
Intercept	54.055	.749	72.122	.000	52.577	55.533
[RACE=1.00]	-7.597	1.989	-3.820	.000	-11.519	-3.675
[RACE=2.00]	3.945	2.823	1.398	.164	-1.622	9.511
[RACE=3.00]	-5.855	2.153	-2.720	.007	-10.101	-1.610
[RACE=4.00]	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

The table above entitled “Parameter Estimates” gives the coefficients (in the column labeled B), the associated standard errors (in the column labeled Std. Error), the associated t-values, the associated p-values (in the column labeled Sig.), and the lower and upper bounds for the 95% confidence interval.

For our example, the regression equation would be: y = 54.055 – 7.597×1 + 3.945×2 -5.855×3. All of the coefficients are statistically significant at the .05 alpha level except the one for x2. In other words, the mean of the dependent variable (write) for both x1 and x3 is statistically significantly different from the mean of the dependent variable for x4 (the omitted level), but not different from x2. Furthermore, the true value of the coefficient for x1 is between -11.519 and -3.675 with a 95% level of certainty. Likewise, the true value of the coefficient for x2 is between -1.622 and 9.511 with a 95% level of certainty, and so on.

You will notice that the values given “ANOVA” and “Coefficients” tables in the section on dummy coding are the same as the values given in the “Tests of Between-Subjects Effects” and “Parameter Estimates”. This is because, as mentioned previously, that dummy coding and simple effect coding yield the same results when the same reference level is used in both coding systems.

**Intercept**
	Contrast
Parameter	L1
Intercept	1.000
[RACE=1.00]	.250
[RACE=2.00]	.250
[RACE=3.00]	.250
[RACE=4.00]	.250
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

**RACE**
	Contrast
Parameter	L2	L3	L4
Intercept	0	0	0
[RACE=1.00]	1	0	0
[RACE=2.00]	0	1	0
[RACE=3.00]	0	0	1
[RACE=4.00]	-1	-1	-1
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

The table above entitled “Intercept” shows the coding that SPSS used for the intercept. Each level of race was given an equal value (.250), and the sum of those is the intercept (1.000). The table entitled “Race” shows the coding for race that was used in the calculations regarding the regression above. Notice that it is simple effect coding, but that it the same results would have been obtained using dummy coding. In this instance, the only difference between simple effect coding and dummy coding is the values assigned to the reference level (race = 4). Because it is the reference level, the only important point is that it have the same value in each of the new variables (called L2, L3 and L4). What that value is, either negative one in simple effect coding or zero in dummy coding, is irrelevant. Regardless of the coding system requested, SPSS will calculate the regression using simple effect coding. Which coding system you specify on the /contrast= statement will be used only in calculating the contrast estimates.

**Contrast Coefficients (L’ Matrix)**
	RACE Simple Contrast(a)
Parameter	Level 1 vs. Level 4	Level 2 vs. Level 4	Level 3 vs. Level 4
Intercept	0	0	0
[RACE=1.00]	1	0	0
[RACE=2.00]	0	1	0
[RACE=3.00]	0	0	1
[RACE=4.00]	-1	-1	-1
The default display of this matrix is the transpose of the corresponding L matrix.				a Reference category = 4

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Simple Contrast(a)			writing score
Level 1 vs. Level 4	Contrast Estimate		-7.597
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-7.597
	Std. Error		1.989
	Sig.		.000
	95% Confidence Interval for Difference	Lower Bound	-11.519
	95% Confidence Interval for Difference	Upper Bound	-3.675
Level 2 vs. Level 4	Contrast Estimate		3.945
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		3.945
	Std. Error		2.823
	Sig.		.164
	95% Confidence Interval for Difference	Lower Bound	-1.622
	95% Confidence Interval for Difference	Upper Bound	9.511
Level 3 vs. Level 4	Contrast Estimate		-5.855
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-5.855
	Std. Error		2.153
	Sig.		.007
	95% Confidence Interval for Difference	Lower Bound	-10.101
	95% Confidence Interval for Difference	Upper Bound	-1.610
a Reference category = 4

**Test Results**Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

The table above entitled “Contrast Coefficients (L’ Matrix)” shows the coding scheme that was used for each comparison. The table entitled “Contrast Results (K Matrix)” shows the results of the various contrasts. In our example, the difference between level 1 of race and level 4 of race is statistically significant. You will notice that the contrast estimate is the difference between the mean for the dependent variable for the omitted level minus the mean of the dependent variable for the first level. In other words, 46.4583 – 54.0552 = -7.597. The hypothesized value is zero (and is zero for all contrast tests). This means that the null hypothesis is that the coefficient equals zero, which is almost always the null hypothesis in which researchers are interested. The row labeled Difference (Estimate – Hypothesized) gives the difference between the contrast estimate and the hypothesized value. Because the null hypothesis is always zero, the contrast estimate and the difference between the contrast estimate and the null hypothesis are the same value. Therefore, you can either refer to the contrast estimate as being either statistically significant or not, or you can refer to the difference as being either statistically significant or not. In our example, the difference between level 2 of race and level 4 of race is not statistically significant, and the difference between level 3 of race and level 4 of race is statistically significant. You will notice that the values given in this table are the same as those given in “Parameter Estimates” table. This is because both used the same coding system and the same reference level. If a different coding system had been requested on the /contrast= statement, or if a different reference level had been specified, the two tables would not have the same numbers. The table entitled “Test Results” indicates that the test of race is statistically significant. In other words, it is a test of all of the contrasts taken together. The results of this test are identical to the overall test of race because there are no other independent variables in the model. If there were, the results of the two tests would be different from one another.

DEVIATION CODING

Method 1:

if race = 1 x1 = 1.
if any(race,2,3) x1 = 0.
if race = 4 x1 = -1.

if race = 2 x2 = 1.
if any(race,1,3) x2 = 0.
if race = 4 x2 = -1.

if race = 3 x3 = 1.
if any(race,1,2) x3 = 0.
if race = 4 x3 = -1.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "group 1 versus groups 1 2 and 3" race .75 -.25 -.25 -.25
 /lmatrix "group 2 versus groups 1 3 and 4" race -.25 .75 -.25 -.25
 /lmatrix "group 3 versus groups 1 2 and 4" race -.25 -.25 .75 -.25.

< output omitted >

Method 3:

glm write by race
 /contrast (race)=deviation
 /print = parameter test(lmatrix).

**Between-Subjects Factors**
		Value Label	N
RACE	1.00	hispanic	24
	2.00	asian	11
	3.00	african-amer	20
	4.00	white	145

**Tests of Between-Subjects Effects** Dependent Variable: writing score
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	1914.158(a)	3	638.053	7.833	.000
Intercept	225523.580	1	225523.580	2768.770	.000
RACE	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453
Total	574919.000	200
Corrected Total	17878.875	199
a R Squared = .107 (Adjusted R Squared = .093)

**Parameter Estimates** Dependent Variable: writing score
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	B	Std. Error	t	Sig.	Lower Bound	Upper Bound
Intercept	54.055	.749	72.122	.000	52.577	55.533
[RACE=1.00]	-7.597	1.989	-3.820	.000	-11.519	-3.675
[RACE=2.00]	3.945	2.823	1.398	.164	-1.622	9.511
[RACE=3.00]	-5.855	2.153	-2.720	.007	-10.101	-1.610
[RACE=4.00]	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

**Intercept**
	Contrast
Parameter	L1
Intercept	1.000
[RACE=1.00]	.250
[RACE=2.00]	.250
[RACE=3.00]	.250
[RACE=4.00]	.250
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

**RACE**
	Contrast
Parameter	L2	L3	L4
Intercept	0	0	0
[RACE=1.00]	1	0	0
[RACE=2.00]	0	1	0
[RACE=3.00]	0	0	1
[RACE=4.00]	-1	-1	-1
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

**Contrast Coefficients (L’ Matrix)**
	RACE Deviation Contrast(a)
Parameter	Level 1 vs. Mean	Level 2 vs. Mean	Level 3 vs. Mean
Intercept	.000	.000	.000
[RACE=1.00]	.750	-.250	-.250
[RACE=2.00]	-.250	.750	-.250
[RACE=3.00]	-.250	-.250	.750
[RACE=4.00]	-.250	-.250	-.250
The default display of this matrix is the transpose of the corresponding L matrix.				a Omitted category = 4

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Deviation Contrast(a)			writing score
Level 1 vs. Mean	Contrast Estimate		-5.220
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-5.220
	Std. Error		1.631
	Sig.		.002
	95% Confidence Interval for Difference	Lower Bound	-8.437
	95% Confidence Interval for Difference	Upper Bound	-2.003
Level 2 vs. Mean	Contrast Estimate		6.322
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		6.322
	Std. Error		2.160
	Sig.		.004
	95% Confidence Interval for Difference	Lower Bound	2.061
	95% Confidence Interval for Difference	Upper Bound	10.582
Level 3 vs. Mean	Contrast Estimate		-3.478
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-3.478
	Std. Error		1.732
	Sig.		.046
	95% Confidence Interval for Difference	Lower Bound	-6.895
	95% Confidence Interval for Difference	Upper Bound	-6.203E-02
a Omitted category = 4

**Test Results** Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

Notice the two different coding systems that are presented in this output. In the table entitled “Race”, you see the coding system that was used to calculate the regression. In the table entitled “Contrast Coefficients (L’ Matrix)”, you see the coding system that was used to calculate the contrast coefficients. It is important to understand why two different coding systems are displayed in the output and to which analysis they refer. From now on, we will not include the “parameter” option on the print statement so that the results of the regression analysis will not be shown. These results would be the same for each example below.

The contrasts estimates in the table entitled “Contrast Results (K Matrix)” are the mean of the particular level minus the grand (unweighted) mean. This grand mean is not the mean of the dependent variable that is listed in the output of the “means” command above. Rather it is the mean of means of the dependent variable at each level of the categorical variable: (46.4583 + 58 + 48.2 + 54.0552) / 4 = 51.678375. The contrast estimate for level 1 versus mean is then 46.4583 – 51.678375 = -5.220. The difference between this value and zero (the null hypothesis that the contrast coefficient is zero) is statistically significant (p = .002). The contrast coefficients for the other comparisons are calculated in the same manner. As with the output of the code using simple effect coding, the table “Test Results” shows the test of all of the contrasts taken together. As expected, the values in this table are the same as those previously.

DIFFERENCE CODING

Method 1:

if race = 1 x1 = -.5.
if race = 2 x1 = .5.
if any(race,3,4) x1 = 0.

if any(race,1,2) x2 = -.333.
if race = 3 x2 = .667.
if race = 4 x2 = 0.

if any(race,1,2,3) x3 = -.25.
if race = 4 x3 = .75.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "group 2 versus group 1" race -1 1 0 0
 /lmatrix "group 3 versus groups 1 and 2" race -.5 -.5 1 0
 /lmatrix "group 4 versus groups 1 2 and 3" race -1/3 -1/3 -1/3 1.

< output omitted >

Method 3:

glm write by race
 /contrast (race)=difference
 /print = test(lmatrix).

< some output omitted >

**Contrast Coefficients (L’ Matrix)**
	RACE Difference Contrast
Parameter	Level 2 vs. Level 1	Level 3 vs. Previous	Level 4 vs. Previous
Intercept	.000	.000	.000
[RACE=1.00]	-1.000	-.500	-.333
[RACE=2.00]	1.000	-.500	-.333
[RACE=3.00]	.000	1.000	-.333
[RACE=4.00]	.000	.000	1.000
The default display of this matrix is the transpose of the corresponding L matrix.

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Difference Contrast			writing score
Level 2 vs. Level 1	Contrast Estimate		11.542
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		11.542
	Std. Error		3.286
	Sig.		.001
	95% Confidence Interval for Difference	Lower Bound	5.061
	95% Confidence Interval for Difference	Upper Bound	18.022
Level 3 vs. Previous	Contrast Estimate		-4.029
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-4.029
	Std. Error		2.602
	Sig.		.123
	95% Confidence Interval for Difference	Lower Bound	-9.161
	95% Confidence Interval for Difference	Upper Bound	1.103
Level 4 vs. Previous	Contrast Estimate		3.169
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		3.169
	Std. Error		1.488
	Sig.		.034
	95% Confidence Interval for Difference	Lower Bound	.235
	95% Confidence Interval for Difference	Upper Bound	6.104

**Test Results** Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

The contrast estimate for the first comparison shown in this output was calculated by subtracting the mean of the dependent variable for level 1 of the categorical variable from the mean of the dependent variable for level 2: 58 – 46.4583 = 11.542. This result is statistically significant. The contrast estimate for the second comparison (between level 3 and the previous levels) was calculated by subtracting the mean of the dependent variable for levels 1 and 2 from that of level 3: 48.2 – [(46.4583 + 58) / 2] = -4.029. This result is not statistically significant, meaning that there is not a reliable difference between the mean of write for level 3 of race compared to the mean of write for levels 1 and 2 (Hispanics and Asians). As noted above, this type of coding system does not make much sense for a nominal variable such as race. For the comparison of level 4 and the previous levels, you take the mean of the dependent variable for the those levels and subtract it from the mean of the dependent variable for level 4: 54.0552 – [(46.4583 + 58 + 48.2) / 3] = 3.169. This result is statistically significant.

Note the use of fractions on the “/lmatrix” statement in Method 2. As mentioned above, you need to use numbers that sum to zero, such as 1/3 + 1/3 + 1/3 – 1. You cannot use .333 instead of 1/3: SPSS will give an error message and fail to calculate the contrast coefficient. The problem is that .333 + .333 + .333 – 1 is not sufficiently close to zero.

HELMERT CODING

Method 1:

if race = 1 x1 = .75.
if any(race,2,3,4) x1 = -.25.

if race = 1 x2 = 0.
if race = 2 x2 = .667.
if any(race,3,4) x2 = -.333.

if any(race,1,2) x3 = 0.
if race = 3 x3 = .5.
if race = 4 x3 = -.5.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "group 1 versus groups 2 3 and 4" race 1 -1/3 -1/3 -1/3
 /lmatrix "group 2 versus groups 3 and 4" race 0 1 -.5 -.5
 /lmatrix "group 3 versus group 4" race 0 0 1 -1.

< output omitted >

Method 3:

glm write by race
 /contrast (race)=helmert
 /print = test(lmatrix).

< some output omitted >

**Contrast Coefficients (L’ Matrix)**
	RACE Helmert Contrast
Parameter	Level 1 vs. Later	Level 2 vs. Later	Level 3 vs. Level 4
Intercept	.000	.000	.000
[RACE=1.00]	1.000	.000	.000
[RACE=2.00]	-.333	1.000	.000
[RACE=3.00]	-.333	-.500	1.000
[RACE=4.00]	-.333	-.500	-1.000
The default display of this matrix is the transpose of the corresponding L matrix.

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Helmert Contrast			writing score
Level 1 vs. Later	Contrast Estimate		-6.960
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-6.960
	Std. Error		2.175
	Sig.		.002
	95% Confidence Interval for Difference	Lower Bound	-11.250
	95% Confidence Interval for Difference	Upper Bound	-2.670
Level 2 vs. Later	Contrast Estimate		6.872
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		6.872
	Std. Error		2.926
	Sig.		.020
	95% Confidence Interval for Difference	Lower Bound	1.101
	95% Confidence Interval for Difference	Upper Bound	12.644
Level 3 vs. Level 4	Contrast Estimate		-5.855
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-5.855
	Std. Error		2.153
	Sig.		.007
	95% Confidence Interval for Difference	Lower Bound	-10.101
	95% Confidence Interval for Difference	Upper Bound	-1.610

**Test Results** Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

The contrast estimate for the comparison between level 1 and the remaining levels (called “later” in the output) is calculated by subtracting the mean of the dependent variable for levels 2, 3 and 4 from the mean of the dependent variable for level 1: 46.4583 – [(58 + 48.2 + 54.0552) / 3] = -6.960, which is statistically significant. This means that the mean of write for level 1 of race is statistically significantly different from the mean of write for levels 2 through 4. As noted above, this comparison probably is not meaningful because the variable race is nominal. This type of comparison would be more meaningful if the categorical variable was ordinal. To calculate the contrast coefficient for the comparison between level 2 and the later levels, you subtract the mean of the dependent variable for levels 3 and 4 from the mean of the dependent variable for level 2: 58 – [(48.2 + 54.0552) / 2] = -11.250, which is statistically significant. The contrast estimate for the comparison between level 3 and level 4 is the difference between the mean of the dependent variable for the two levels: 48.2 – 54.0552 = -5.855, which is also statistically significant.

ORTHOGONAL POLYNOMIAL CODING

Method 1:

if race = 1 x1 = -.671.
if race = 2 x1 = -.224.
if race = 3 x1 = .224.
if race = 4 x1 = .671.

if race = 1 x2 = .5.
if race = 2 x2 = -.5.
if race = 3 x2 = -.5.
if race = 4 x2 = .5.

if race = 1 x3 = -.224.
if race = 2 x3 = .671.
if race = 3 x3 = -.671.
if race = 4 x3 = .224.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "linear" race -.671 -.224 .224 .671
 /lmatrix "quadratic" race .5 -.5 -.5 .5
 /lmatrix "cubic" race -.224 .671 -.671 .224.

< output omitted >

Method 3:

glm write by race /contrast (race)=polynomial /print = test(lmatrix). < some output omitted >

**Contrast Coefficients (L’ Matrix)**
	RACE Polynomial Contrast(a)
Parameter	Linear	Quadratic	Cubic
Intercept	.000	.000	.000
[RACE=1.00]	-.671	.500	-.224
[RACE=2.00]	-.224	-.500	.671
[RACE=3.00]	.224	-.500	-.671
[RACE=4.00]	.671	.500	.224
The default display of this matrix is the transpose of the corresponding L matrix.				a Metric = 1.000, 2.000, 3.000, 4.000

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Polynomial Contrast(a)			writing score
Linear	Contrast Estimate		2.905
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		2.905
	Std. Error		1.534
	Sig.		.060
	95% Confidence Interval for Difference	Lower Bound	-.121
	95% Confidence Interval for Difference	Upper Bound	5.931
Quadratic	Contrast Estimate		-2.843
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-2.843
	Std. Error		1.964
	Sig.		.149
	95% Confidence Interval for Difference	Lower Bound	-6.717
	95% Confidence Interval for Difference	Upper Bound	1.031
Cubic	Contrast Estimate		8.273
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		8.273
	Std. Error		2.316
	Sig.		.000
	95% Confidence Interval for Difference	Lower Bound	3.706
	95% Confidence Interval for Difference	Upper Bound	12.840
a Metric = 1.000, 2.000, 3.000, 4.000

**Test Results** Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

To calculate the contrast estimates for these comparisons, you need to multiply the code used in the new variable by the mean for the dependent variable for each level of the categorical variable, and then sum the values. For example, the code used in x1 for level 1 of race is -.671 and the mean of write for level 1 is 46.4583. Hence, you would multiple -.671 and 46.4583 and add that to the product of the code for level 2 of x1 and its mean, and so on. To obtain the contrast estimate for the linear contrast, you would do the following: -.671*46.4583 + -.224*58 + .224*48.2 + .671*54.0552 = 2.905 (with rounding error). This result is not statistically significant at the .05 alpha level, but it is close. The quadratic component is also not statistically significant, but the cubic one is. This suggests that, if the mean of the dependent variable plotted against race, the line would tend to have two bends. As noted earlier, this type of coding system does not make much sense with a nominal variable such as race.

REPEATED EFFECT CODING

Method 1:

if race = 1 x1 = .75.
if any(race,2,3,4) x1 = -.25.

if any(race,1,2) x2 = .5.
if any(race,3,4) x2 = -.5.

if any(race,1,2,3) x3 = .25.
if race = 4 x3 = -.75.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "group 1 versus group 2" race 1 -1 0 0
 /lmatrix "group 2 versus group 3" race 0 1 -1 0
 /lmatrix "group 3 versus group 4" race 0 0 1 -1.

< output omitted >

Method 3:

glm write by race
 /contrast (race)=repeated
 /print = test(lmatrix).

< some output omitted >

**Contrast Coefficients (L’ Matrix)**
	RACE Repeated Contrast
Parameter	Level 1 vs. Level 2	Level 2 vs. Level 3	Level 3 vs. Level 4
Intercept	0	0	0
[RACE=1.00]	1	0	0
[RACE=2.00]	-1	1	0
[RACE=3.00]	0	-1	1
[RACE=4.00]	0	0	-1
The default display of this matrix is the transpose of the corresponding L matrix.

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Repeated Contrast			writing score
Level 1 vs. Level 2	Contrast Estimate		-11.542
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-11.542
	Std. Error		3.286
	Sig.		.001
	95% Confidence Interval for Difference	Lower Bound	-18.022
	95% Confidence Interval for Difference	Upper Bound	-5.061
Level 2 vs. Level 3	Contrast Estimate		9.800
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		9.800
	Std. Error		3.388
	Sig.		.004
	95% Confidence Interval for Difference	Lower Bound	3.119
	95% Confidence Interval for Difference	Upper Bound	16.481
Level 3 vs. Level 4	Contrast Estimate		-5.855
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-5.855
	Std. Error		2.153
	Sig.		.007
	95% Confidence Interval for Difference	Lower Bound	-10.101
	95% Confidence Interval for Difference	Upper Bound	-1.610

**Test Results** Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

With this coding system, adjacent levels of the categorical variable are compared. Hence, the mean of the dependent variable at level 1 is compared to the mean of the dependent variable at level 2: 46.4583 – 58 = -11.542, which is statistically significant. For the comparison between levels 2 and 3, the calculation of the contrast coefficient would be 58 – 48.2 = 9.8, which is also statistically significant. Finally, comparing levels 3 and 4, 48.2 – 54.0552 = -5.855, a statistically significant difference. One would conclude from this that each adjacent level of race is statistically significantly different.

SPECIAL USER-DEFINED CODING SYSTEM

Let’s compare: 1) level 1 to level3, 2) level 2 to levels 1 and 4 and 3) levels 1 and 2 to levels 3 and 4.

Method 1:

if race = 1 x1 = -.5.
if race = 2 x1 = .5.
if race = 3 x1 = -1.5.
if race = 4 x1 = 1.5.

if any(race,1,3) = 1 x2 = -1.
if any(race,2,4) = 1 x2 = 1.

if any(race,1,3) = 1 x3 = 1.5.
if race = 2 x3 = -.5.
if race = 4 x3 = -2.5.
execute.

regression
 /dep write
 /method = enter x1 x2 x3.

< output omitted >

Method 2:

glm write by race
 /lmatrix "compare group 1 to group 3" race 1 0 -1 0
 /lmatrix "compare group 2 to groups 1 and 4" race -.5 1 0 -.5
 /lmatrix "compare groups 1 and 2 to groups 3 and 4" race .5 .5 -.5 -.5.

< output omitted >

Method 3:

glm write by race
 /contrast (race)=special(1 0 -1 0, -.5 1 0 -.5, .5 .5 -.5 -.5)
 /print = test(lmatrix).

< some output omitted >

**Contrast Coefficients (L’ Matrix)**
	RACE Special Contrast
Parameter	L1	L2	L3
Intercept	.000	.000	.000
[RACE=1.00]	1.000	-.500	.500
[RACE=2.00]	.000	1.000	.500
[RACE=3.00]	-1.000	.000	-.500
[RACE=4.00]	.000	-.500	-.500
The default display of this matrix is the transpose of the corresponding L matrix.

**Contrast Results (K Matrix)**
			Dependent Variable
RACE Special Contrast			writing score
L1	Contrast Estimate		-1.742
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		-1.742
	Std. Error		2.732
	Sig.		.525
	95% Confidence Interval for Difference	Lower Bound	-7.131
	95% Confidence Interval for Difference	Upper Bound	3.647
L2	Contrast Estimate		7.743
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		7.743
	Std. Error		2.897
	Sig.		.008
	95% Confidence Interval for Difference	Lower Bound	2.030
	95% Confidence Interval for Difference	Upper Bound	13.457
L3	Contrast Estimate		1.102
	Hypothesized Value		0
	Difference (Estimate – Hypothesized)		1.102
	Std. Error		1.964
	Sig.		.576
	95% Confidence Interval for Difference	Lower Bound	-2.772
	95% Confidence Interval for Difference	Upper Bound	4.975

**Test Results**Dependent Variable: writing score
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	1914.158	3	638.053	7.833	.000
Error	15964.717	196	81.453

The first comparison of the mean of the dependent variable for level 1 to level 3 of the categorical variable was not statistically significant, while the comparison of the mean of the dependent variable for level 2 to that of levels 1 and 4 was. The comparison of the mean of the dependent variable for levels 1 and 2 to that of levels 3 and 4 was not statistically significant.