Regression with SPSS Chapter 3 – Regression with Categorical Predictors

Chapter Outline
    3.0 Regression with Categorical Predictors
    3.1 Regression with a 0/1 variable
    3.2 Regression with a 1/2 variable
    3.3 Regression with a 1/2/3 variable
    3.4 Regression with multiple categorical predictors
    3.5 Categorical predictor with interactions
    3.6 Continuous and Categorical variables
    3.7 Interactions of Continuous by 0/1 Categorical variables
    3.8 Continuous and Categorical variables, interaction with 1/2/3 variable
    3.9 Summary
    3.10 For more information

3.0 Introduction

In the previous two chapters, we have focused on regression analyses using continuous variables. However, it is possible to include categorical predictors in a regression analysis, but it requires some extra work in performing the analysis and extra work in properly interpreting the results. This chapter will illustrate how you can use SPSS for including categorical predictors in your analysis and describe how to interpret the results of such analyses.

This chapter will use the elemapi2 data that you have seen in the prior chapters. We will focus on four variables: api00, some_col, yr_rnd and mealcat.

The variable api00 is a measure of the performance of the students. The variable some_col is a continuous variable that measures the percentage of the parents of the children in the school who have attended college. The variable yr_rnd is a categorical variable that is coded 0 if the school is not year round and 1 if year round. The variable meals is the percentage of students who are receiving state sponsored free meals and can be used as an indicator of poverty. This was broken into 3 categories (to make equally sized groups) creating the variable mealcat.

3.1 Regression with a 0/1 variable

The simplest example of a categorical predictor in a regression analysis is a 0/1 variable, also called a dummy variable. Let’s use the variable yr_rnd as an example of a dummy variable. We can include a dummy variable as a predictor in a regression analysis as shown below.

GET FILE='C:spssregelemapi2.sav'.

regression
 /dep api00
 /method = enter yr_rnd.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	year round school(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.475(a)	.226	.224	125.300
a Predictors: (Constant), year round school

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	1825000.563	1	1825000.563	116.241	.000(a)
	Residual	6248671.435	398	15700.179
	Total	8073671.997	399
a Predictors: (Constant), year round school
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	684.539	7.140		95.878	.000
1	year round school	-160.506	14.887	-.475	-10.782	.000
a Dependent Variable: api 2000

This may seem odd at first, but this is a legitimate analysis. But what does this mean? Let’s go back to basics and write out the regression equation that this model implies.

api00 = constant + Byr_rnd * yr_rnd

where constant is the intercept and we use Byr_rnd to represent the coefficient for variable yr_rnd. Filling in the values from the regression equation, we get

api00 = 684.539 + -160.5064 * yr_rnd

If a student is not in year-round school (i.e., yr_rnd is 0) the regression equation would simplify to

api00 = constant    + 0 * Byr_rnd 
api00 = 684.539     + 0 * -160.5064  
api00 = 684.539

If a student is year-round school, the regression equation would simplify to

api00 = constant + 1 * Byr_rnd 
api00 = 684.539  + 1 * -160.5064 
api00 = 524.0326

We can graph the observed values and the predicted values using the ggraph command as shown below. Although yr_rnd only has 2 values, we can still draw a regression line showing the relationship between yr_rnd and api00. Based on the results above, we see that the predicted value for non-year round schools is 684.539 and the predicted value for the year round schools is 524.032, and the slope of the line is negative, which makes sense since the coefficient for yr_rnd was negative (-160.5064). Note that the "type = scale" option is needed here because yr_rnd is an ordinal variable in the dataset.

GGRAPH
  /GRAPHDATASET NAME="GraphDataset" VARIABLES= api00 yr_rnd 
  /GRAPHSPEC SOURCE=INLINE .
BEGIN GPL
SOURCE: s=userSource( id( "GraphDataset" ) )
DATA: yr_rnd=col(source(s), name("yr_rnd"), unit.category()) 
DATA: api00=col(source(s), name("api00")) 
GUIDE: axis(dim(1), label("year round school")) 
GUIDE: axis(dim(2), label("api 2000")) 
SCALE: cat(dim(1), include("0", "1")) 
SCALE: linear(dim(2), include(0)) 
ELEMENT: point(position(yr_rnd*api00))
ELEMENT: line( position(smooth.linear( yr_rnd * api00 ) ) )
END GPL.

Let's compare these predicted values to the mean api00 scores for the year-round and non-year-round students.

MEANS
  TABLES=api00 BY yr_rnd.

**Case Processing Summary**
	Cases
	Included		Excluded		Total
	N	Percent	N	Percent	N	Percent
api 2000 * year round school	400	100.0%	0	.0%	400	100.0%

**Report**
api 2000
year round school	Mean	N	Std. Deviation
No	684.54	308	132.113
Yes	524.03	92	98.916
Total	647.62	400	142.249

As you see, the regression equation predicts that the value of api00 will be the mean value of your group, depending on whether you went to year round school or non-year round school.

Let's relate these predicted values back to the regression equation. For the non-year-round students, their mean is the same as the intercept (684.539). The coefficient for yr_rnd is the amount we need to add to get the mean for the year-round students, i.e., we need to add -160.5064 to get 524.0326, the mean for the non year-round students. In other words, Byr_rnd is the mean api00 score for the year-round students minus the mean api00 score for the non year-round students, i.e., mean(year-round) - mean(non year-round).

It may be surprising to note that this regression analysis with a single dummy variable is the same as doing a t-test comparing the mean api00 for the year-round students with the non year-round students (see below). You can see that the t-value below is the same as the t-value for yr_rnd in the regression above. This is because Byr_rnd compares the non year-rounds and non year-rounds (since the coefficient is mean(year round)-mean(non year-round)).

T-TEST
  GROUPS=yr_rnd(0 1)
  /VARIABLES=api00.

**Group Statistics**
	year round school	N	Mean	Std. Deviation	Std. Error Mean
api 2000	No	308	684.54	132.113	7.528
api 2000	Yes	92	524.03	98.916	10.313

**Independent Samples Test**
		Levene's Test for Equality of Variances		t-test for Equality of Means
		F	Sig.	t	df	Sig. (2-tailed)	Mean Difference	Std. Error Difference	95% Confidence Interval of the Difference
		F	Sig.	t	df	Sig. (2-tailed)	Mean Difference	Std. Error Difference	Lower	Upper
api 2000	Equal variances assumed	20.539	.000	10.782	398	.000	160.51	14.887	131.239	189.774
api 2000	Equal variances not assumed			12.571	197.215	.000	160.51	12.768	135.327	185.686

Since a t-test is the same as doing an ANOVA, we can get the same results using the anova command as well. Note that in SPSS, when you click on "analyze" and "compare means," you can select a one-way ANOVA test. The code for conducting a one-way ANOVA is shown below. After this analysis, however, we will use the glm (for general linear model) command instead of the oneway command.

ONEWAY
  api00 BY yr_rnd.

**ANOVA**
api 2000
	Sum of Squares	df	Mean Square	F	Sig.
Between Groups	1825000.563	1	1825000.563	116.241	.000
Within Groups	6248671.435	398	15700.179
Total	8073671.998	399

Remember that if you square the t-value, you will get the F-value: 10.7815**2 = 116.24074 , showing another way in which the t-test is the same as the ANOVA test.

3.2 Regression with a 1/2 variable

A categorical predictor variable does not have to be coded 0/1 to be used in a regression model. It is easier to understand and interpret the results from a model with dummy variables, but the results from a variable coded 1/2 yield essentially the same results.

Let's make a copy of the variable yr_rnd called yr_rnd2 that is coded 1/2, 1=non year-round and 2=year-round.

compute yr_rnd2 = yr_rnd.
recode yr_rnd2 (0=1) (1=2).
execute.

REGRESSION
  /DEPENDENT api00
  /METHOD=ENTER yr_rnd2.
 
<some output omitted to save space>

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 845.045 19.353
43.664 .000

YR_RND2 -160.506 14.887 -.475 -10.782 .000

a Dependent Variable: api 2000

**Coefficients(a)**
	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
Model	B	Std. Error	Beta
1	(Constant)	845.045	19.353		43.664	.000
YR_RND2	-160.506	14.887	-.475	-10.782	.000
a Dependent Variable: api 2000

Note that the coefficient for yr_rnd is the same as yr_rnd2. So, you can see that if you code yr_rnd as 0/1 or as 1/2, the regression coefficient works out to be the same. However the intercept is a bit less intuitive. When we used yr_rnd, the intercept was the mean for the non year-rounds. When using yr_rnd2, the intercept is the mean for the non year-rounds minus Byr_rnd2, i.e., 684.539 - (-160.506) = 845.045

Note that you can use 0/1 or 1/2 coding and the results for the coefficient come out the same, but the interpretation of constant in the regression equation is different. It is often easier to interpret the estimates for 0/1 coding.

In summary, these results indicate that the api00 scores are significantly different for the students depending on the type of school they attend, year round school vs. non-year round school. Those who attend non-year round school have significantly higher scores. Based on the regression results, those who attend non-year round schools have scores that are 160.5 points higher than those who attend year-round schools.

3.3 Regression with a 1/2/3 variable

3.3.1 Manually Creating Dummy Variables

Say that we would like to examine the relationship between the amount of poverty and api scores. We don't have a measure of poverty, but we can use mealcat as a proxy for a measure of poverty. You might be tempted to try including mealcat in a regression like this.

regression
 /dependent api00
 /method=enter mealcat.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	Percentage free meals in 3 categories(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.867(a)	.752	.752	70.908
a Predictors: (Constant), Percentage free meals in 3 categories

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6072527.519	1	6072527.519	1207.742	.000(a)
	Residual	2001144.479	398	5028.001
	Total	8073671.997	399
a Predictors: (Constant), Percentage free meals in 3 categories
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	950.987	9.422		100.935	.000
1	Percentage free meals in 3 categories	-150.553	4.332	-.867	-34.753	.000
a Dependent Variable: api 2000

This is looking at the linear effect of mealcat with api00, but mealcat is not an interval variable. Instead, you will want to code the variable so that all the information concerning the three levels is accounted for. You can dummy code mealcat like this.

if mealcat ~= missing(mealcat) mealcat1 = 0.
if mealcat = 1 mealcat1 = 1.
if mealcat ~= missing(mealcat) mealcat2 = 0.
if mealcat = 2 mealcat2 = 1.
if mealcat ~= missing(mealcat) mealcat3 = 0.
if mealcat = 3 mealcat3 = 1.
execute.

We now have created mealcat1 that is 1 if mealcat is 1, and 0 otherwise. Likewise, mealcat2 is 1 if mealcat is 2, and 0 otherwise; and likewise mealcat3 was created. We can see this below.

list mealcat mealcat1 mealcat2 mealcat3
 /cases from 1 to 10.

           MEALCAT MEALCAT1 MEALCAT2 MEALCAT3

                 2      .00     1.00      .00
                 3      .00      .00     1.00
                 3      .00      .00     1.00
                 3      .00      .00     1.00
                 3      .00      .00     1.00
                 1     1.00      .00      .00
                 1     1.00      .00      .00
                 1     1.00      .00      .00
                 1     1.00      .00      .00
                 1     1.00      .00      .00

Number of cases read:  10    Number of cases listed:  10

We can now use two of these dummy variables (mealcat2 and mealcat3) in the regression analysis.

regression
 /dependent api00
 /method = enter mealcat2 mealcat3.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT3, MEALCAT2(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.869(a)	.755	.754	70.612
a Predictors: (Constant), MEALCAT3, MEALCAT2

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6094197.670	2	3047098.835	611.121	.000(a)
	Residual	1979474.328	397	4986.081
	Total	8073671.997	399
a Predictors: (Constant), MEALCAT3, MEALCAT2
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	805.718	6.169		130.599	.000
	MEALCAT2	-166.324	8.708	-.550	-19.099	.000
	MEALCAT3	-301.338	8.629	-1.007	-34.922	.000
a Dependent Variable: api 2000

We can test the overall differences among the three groups by using the /method = test statement as shown below. This shows that the overall differences among the three groups are significant, with an F value of 611.121 and a p value of .000.

regression
 /dependent api00
 /method = test (mealcat2 mealcat3).

**Variables Entered/Removed(a)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT3, MEALCAT2	.	Test
a Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.869(a)	.755	.754	70.612
a Predictors: (Constant), MEALCAT3, MEALCAT2

**ANOVA(c)**
Model			Sum of Squares	df	Mean Square	F	Sig.	R Square Change
1	Subset Tests	MEALCAT2, MEALCAT3	6094197.670	2	3047098.835	611.121	.000(a)	.755
	Regression		6094197.670	2	3047098.835	611.121	.000(b)
	Residual		1979474.328	397	4986.081
	Total		8073671.997	399
a Tested against the full model.
b Predictors in the Full Model: (Constant), MEALCAT3, MEALCAT2.
c Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	805.718	6.169		130.599	.000
	MEALCAT2	-166.324	8.708	-.550	-19.099	.000
	MEALCAT3	-301.338	8.629	-1.007	-34.922	.000
a Dependent Variable: api 2000

The interpretation of the coefficients is much like that for the binary variables. Group 1 is the omitted group, so the constant is the mean for group 1. The coefficient for mealcat2 is the mean for group 2 minus the mean of the omitted group (group 1), and the coefficient for mealcat3 is the mean of group 3 minus the mean of group 1. You can verify this by comparing the coefficients with the means of the groups, shown below.

MEANS
  TABLES=api00 BY mealcat.

**Case Processing Summary**
	Cases
	Included		Excluded		Total
	N	Percent	N	Percent	N	Percent
api 2000 * Percentage free meals in 3 categories	400	100.0%	0	.0%	400	100.0%

**Report**
api 2000
Percentage free meals in 3 categories	Mean	N	Std. Deviation
0-46% free meals	805.72	131	65.669
47-80% free meals	639.39	132	82.135
81-100% free meals	504.38	137	62.727
Total	647.62	400	142.249

Based on these results, we can say that the three groups differ in their api00 scores, and that in particular group2 is significantly different from group1 (because mealcat2 was significant) and group 3 is significantly different from group 1 (because mealcat3 was significant).

3.3.2 Using Do Loops

We can use the do repeat command to do the work for us to create the indicator (dummy) variables. This method is particularly useful when you need to create many indicator variables.

DO REPEAT A=mealcat1 mealcat2 mealcat3 
 /B=1 2 3.
COMPUTE A=(mealcat=B).
END REPEAT.
We will then do a crosstab to verify that our indicator variables were created correctly.
crosstab /tables = mealcat by mealcat1
         /tables = mealcat by mealcat2
         /tables = mealcat by mealcat3.
 
Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

Percentage free meals in 3 categories * MEALCAT1 400 100.0% 0 .0% 400 100.0%

Percentage free meals in 3 categories * MEALCAT2 400 100.0% 0 .0% 400 100.0%

Percentage free meals in 3 categories * MEALCAT3 400 100.0% 0 .0% 400 100.0%

Percentage free meals in 3 categories * MEALCAT1 Crosstabulation
Count

MEALCAT1 Total

.00 1.00

Percentage free meals in 3 categories 0-46% free meals
131 131

47-80% free meals 132
132

81-100% free meals 137
137

Total 269 131 400

Percentage free meals in 3 categories * MEALCAT2 Crosstabulation
Count

MEALCAT2 Total

.00 1.00

Percentage free meals in 3 categories 0-46% free meals 131
131

47-80% free meals
132 132

81-100% free meals 137
137

Total 268 132 400

Percentage free meals in 3 categories * MEALCAT3 Crosstabulation
Count

MEALCAT3 Total

.00 1.00

Percentage free meals in 3 categories 0-46% free meals 131
131

47-80% free meals 132
132

81-100% free meals
137 137

Total 263 137 400

**Case Processing Summary**
	Cases
Valid	Missing	Total
N	Percent	N	Percent	N	Percent
Percentage free meals in 3 categories * MEALCAT1	400	100.0%	0	.0%	400	100.0%
Percentage free meals in 3 categories * MEALCAT2	400	100.0%	0	.0%	400	100.0%
Percentage free meals in 3 categories * MEALCAT3	400	100.0%	0	.0%	400	100.0%

**Percentage free meals in 3 categories * MEALCAT1 Crosstabulation**
Count
	MEALCAT1	Total
.00	1.00
Percentage free meals in 3 categories	0-46% free meals		131	131
47-80% free meals	132		132
81-100% free meals	137		137
Total	269	131	400

**Percentage free meals in 3 categories * MEALCAT2 Crosstabulation**
Count
	MEALCAT2	Total
.00	1.00
Percentage free meals in 3 categories	0-46% free meals	131		131
47-80% free meals		132	132
81-100% free meals	137		137
Total	268	132	400

**Percentage free meals in 3 categories * MEALCAT3 Crosstabulation**
Count
	MEALCAT3	Total
.00	1.00
Percentage free meals in 3 categories	0-46% free meals	131		131
47-80% free meals	132		132
81-100% free meals		137	137
Total	263	137	400

What if we wanted a different group to be the reference group? For example, let's omit group 3.

regression
 /dependent api00
 /method = enter mealcat1 mealcat2.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT2, MEALCAT1(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.869(a)	.755	.754	70.612
a Predictors: (Constant), MEALCAT2, MEALCAT1

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6094197.670	2	3047098.835	611.121	.000(a)
	Residual	1979474.328	397	4986.081
	Total	8073671.997	399
a Predictors: (Constant), MEALCAT2, MEALCAT1
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	504.380	6.033		83.606	.000
	MEALCAT1	301.338	8.629	.995	34.922	.000
	MEALCAT2	135.014	8.612	.447	15.677	.000
a Dependent Variable: api 2000

With group 3 omitted, the constant is now the mean of group 3 and mealcat1 is group1-group3 and mealcat2 is group2-group3. We see that both of these coefficients are significant, indicating that group 1 is significantly different from group 3 and group 2 is significantly different from group 3.

3.3.3 Using the glm command

We can also do this analysis using the glm command. The benefit of the glm command is that it we don't need to manually create dummy varaibles, and it gives us the test of the overall effect of mealcat without needing to subsequently use the /method = test statement as we did with the regress command.

glm api00 by mealcat.

Between-Subjects Factors

Value Label N

Percentage free meals in 3 categories 1 0-46% free meals 131

2 47-80% free meals 132

3 81-100% free meals 137

Tests of Between-Subjects Effects
Dependent Variable: api 2000
Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 6094197.670(a) 2 3047098.835 611.121 .000

Intercept 168847142.059 1 168847142.059 33863.695 .000

MEALCAT 6094197.670 2 3047098.835 611.121 .000

Error 1979474.328 397 4986.081

Total 175839633.000 400

Corrected Total 8073671.997 399

a R Squared = .755 (Adjusted R Squared = .754)

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	6094197.670(a)	2	3047098.835	611.121	.000
Intercept	168847142.059	1	168847142.059	33863.695	.000
MEALCAT	6094197.670	2	3047098.835	611.121	.000
Error	1979474.328	397	4986.081
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .755 (Adjusted R Squared = .754)

We can use the /print=parameter statement with the glm command to obtain the parameter estimates. Note that the estimates are based on dummy coding with the last (third) category omitted, and correspond to the results shown above where the third category was omitted.

glm
 api00 by mealcat
 /print=parameter.
Between-Subjects Factors

Value Label N

Percentage free meals in 3 categories 1 0-46% free meals 131

2 47-80% free meals 132

3 81-100% free meals 137

Tests of Between-Subjects Effects
Dependent Variable: api 2000
Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 6094197.670(a) 2 3047098.835 611.121 .000

Intercept 168847142.059 1 168847142.059 33863.695 .000

MEALCAT 6094197.670 2 3047098.835 611.121 .000

Error 1979474.328 397 4986.081

Total 175839633.000 400

Corrected Total 8073671.997 399

a R Squared = .755 (Adjusted R Squared = .754)

Parameter Estimates
Dependent Variable: api 2000

B Std. Error t Sig. 95% Confidence Interval

Parameter Lower Bound Upper Bound

Intercept 504.380 6.033 83.606 .000 492.519 516.240

[MEALCAT=1] 301.338 8.629 34.922 .000 284.374 318.302

[MEALCAT=2] 135.014 8.612 15.677 .000 118.083 151.945

[MEALCAT=3] 0(a) . . . . .

a This parameter is set to zero because it is redundant.

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	6094197.670(a)	2	3047098.835	611.121	.000
Intercept	168847142.059	1	168847142.059	33863.695	.000
MEALCAT	6094197.670	2	3047098.835	611.121	.000
Error	1979474.328	397	4986.081
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .755 (Adjusted R Squared = .754)

**Parameter Estimates**
Dependent Variable: api 2000
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	Lower Bound	Upper Bound
Intercept	504.380	6.033	83.606	.000	492.519	516.240
[MEALCAT=1]	301.338	8.629	34.922	.000	284.374	318.302
[MEALCAT=2]	135.014	8.612	15.677	.000	118.083	151.945
[MEALCAT=3]	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

Note that the parameter estimates are the same because mealcat is coded the same way in the regress command and in the glm command, because in both cases the last category (category 3) is being dropped.

3.3.4 Other coding schemes

It is generally very convenient to use dummy coding, but that is not the only kind of coding that can be used. As you have seen, when you use dummy coding one of the groups becomes the reference group and all of the other groups are compared to that group. This may not be the most interesting set of comparisons. Below is a list of the types of coding schemes that SPSS will create for you. You can access these through the pull-down menus, or you can request it on the /CONTRAST statement when using GLM (described later). First, we show you how to manually create the codes.

Deviation(refcat): The deviations from the grand mean.
Difference: The difference or reverse Helmert contrast - compare levels of a factor with the mean of the previous levels of the factor.
Simple(refcat): Compare each level of a factor to the last level.
Helmert: Compare levels of a factor with the mean of the subsequent levels of the factor.
Polynomial: Orthogonal polynomial contrasts.
Repeated: Adjacent levels of a factor.
Special: A user-defined contrast.

Let's create a variable that compares group 1 with 2 and another variable that compares group 2 with 3, and include those variables in the regression model. In other words, we wish to create coefficients are comparisons of successive groups with group 1 as the baseline comparison group (i.e., the first comparison comparing group 1 vs. 2, and the second comparison comparing groups 2 vs. 3). Below we show how to manually generate a coding scheme that forms these 2 comparisons.

if mealcat = 1 grp1 = .667.
if mealcat = 2 grp1 = -.333.
if mealcat = 3 grp1 = -.333.

if mealcat = 1 grp2 = .333.
if mealcat = 2 grp2 = .333.
if mealcat = 3 grp2 = -.667.
execute.

regression
 /dep = api00
 /method = enter grp1 grp2.
 
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 GRP2, GRP1(a) . Enter

a All requested variables entered.
b Dependent Variable: api 2000

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .869(a) .755 .754 70.612

a Predictors: (Constant), GRP2, GRP1

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.

1 Regression 6094197.670 2 3047098.835 611.121 .000(a)

Residual 1979474.328 397 4986.081

Total 8073671.997 399

a Predictors: (Constant), GRP2, GRP1
b Dependent Variable: api 2000

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 649.820 3.531
184.016 .000

GRP1 166.324 8.708 .549 19.099 .000

GRP2 135.014 8.612 .451 15.677 .000

a Dependent Variable: api 2000

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	GRP2, GRP1(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Coefficients(a)**
	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
Model	B	Std. Error	Beta
1	(Constant)	649.820	3.531		184.016	.000
GRP1	166.324	8.708	.549	19.099	.000
GRP2	135.014	8.612	.451	15.677	.000
a Dependent Variable: api 2000

We can perform this same series of comparisions much easier using the glm command with the contrast statement.

glm api00 by mealcat /contrast (mealcat)=repeated /print = parameter TEST(LMATRIX).

Between-Subjects Factors

Value Label N

Percentage free meals in 3 categories 1 0-46% free meals 131

2 47-80% free meals 132

3 81-100% free meals 137

Tests of Between-Subjects Effects
Dependent Variable: api 2000
Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 6094197.670(a) 2 3047098.835 611.121 .000

Intercept 168847142.059 1 168847142.059 33863.695 .000

MEALCAT 6094197.670 2 3047098.835 611.121 .000

Error 1979474.328 397 4986.081

Total 175839633.000 400

Corrected Total 8073671.997 399

a R Squared = .755 (Adjusted R Squared = .754)

Parameter Estimates
Dependent Variable: api 2000

B Std. Error t Sig. 95% Confidence Interval

Parameter Lower Bound Upper Bound

Intercept 504.380 6.033 83.606 .000 492.519 516.240

[MEALCAT=1] 301.338 8.629 34.922 .000 284.374 318.302

[MEALCAT=2] 135.014 8.612 15.677 .000 118.083 151.945

[MEALCAT=3] 0(a) . . . . .

a This parameter is set to zero because it is redundant.

Intercept

Contrast

Parameter L1

Intercept 1.000

[MEALCAT=1] .333

[MEALCAT=2] .333

[MEALCAT=3] .333

The default display of this matrix is the transpose of the corresponding L matrix.
Based on Type III Sums of Squares.

MEALCAT

Contrast

Parameter L2 L3

Intercept 0 0

[MEALCAT=1] 1 0

[MEALCAT=2] 0 1

[MEALCAT=3] -1 -1

The default display of this matrix is the transpose of the corresponding L matrix.
Based on Type III Sums of Squares.

Contrast Coefficients (L' Matrix)

Percentage free meals in 3 categories Repeated Contrast

Parameter Level 1 vs. Level 2 Level 2 vs. Level 3

Intercept 0 0

[MEALCAT=1] 1 0

[MEALCAT=2] -1 1

[MEALCAT=3] 0 -1

The default display of this matrix is the transpose of the corresponding L matrix.

Contrast Results (K Matrix)

Dependent Variable

Percentage free meals in 3 categories Repeated Contrast api 2000

Level 1 vs. Level 2 Contrast Estimate 166.324

Hypothesized Value 0

Difference (Estimate - Hypothesized) 166.324

Std. Error 8.708

Sig. .000

95% Confidence Interval for Difference Lower Bound 149.203

Upper Bound 183.444

Level 2 vs. Level 3 Contrast Estimate 135.014

Hypothesized Value 0

Difference (Estimate - Hypothesized) 135.014

Std. Error 8.612

Sig. .000

95% Confidence Interval for Difference Lower Bound 118.083

Upper Bound 151.945

Test Results
Dependent Variable: api 2000
Source Sum of Squares df Mean Square F Sig.

Contrast 6094197.670 2 3047098.835 611.121 .000

Error 1979474.328 397 4986.081

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	6094197.670(a)	2	3047098.835	611.121	.000
Intercept	168847142.059	1	168847142.059	33863.695	.000
MEALCAT	6094197.670	2	3047098.835	611.121	.000
Error	1979474.328	397	4986.081
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .755 (Adjusted R Squared = .754)

**Parameter Estimates**
Dependent Variable: api 2000
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	Lower Bound	Upper Bound
Intercept	504.380	6.033	83.606	.000	492.519	516.240
[MEALCAT=1]	301.338	8.629	34.922	.000	284.374	318.302
[MEALCAT=2]	135.014	8.612	15.677	.000	118.083	151.945
[MEALCAT=3]	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

**Intercept**
	Contrast
Parameter	L1
Intercept	1.000
[MEALCAT=1]	.333
[MEALCAT=2]	.333
[MEALCAT=3]	.333
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

**MEALCAT**
	Contrast
Parameter	L2	L3
Intercept	0	0
[MEALCAT=1]	1	0
[MEALCAT=2]	0	1
[MEALCAT=3]	-1	-1
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.

**Contrast Coefficients (L' Matrix)**
	Percentage free meals in 3 categories Repeated Contrast
Parameter	Level 1 vs. Level 2	Level 2 vs. Level 3
Intercept	0	0
[MEALCAT=1]	1	0
[MEALCAT=2]	-1	1
[MEALCAT=3]	0	-1
The default display of this matrix is the transpose of the corresponding L matrix.

**Contrast Results (K Matrix)**
	Dependent Variable
Percentage free meals in 3 categories Repeated Contrast	api 2000
Level 1 vs. Level 2	Contrast Estimate	166.324
Hypothesized Value	0
Difference (Estimate - Hypothesized)	166.324
Std. Error	8.708
Sig.	.000
95% Confidence Interval for Difference	Lower Bound	149.203
Upper Bound	183.444
Level 2 vs. Level 3	Contrast Estimate	135.014
Hypothesized Value	0
Difference (Estimate - Hypothesized)	135.014
Std. Error	8.612
Sig.	.000
95% Confidence Interval for Difference	Lower Bound	118.083
Upper Bound	151.945

**Test Results**
Dependent Variable: api 2000
Source	Sum of Squares	df	Mean Square	F	Sig.
Contrast	6094197.670	2	3047098.835	611.121	.000
Error	1979474.328	397	4986.081

If you compare the parameter estimates with the means you can verify that B1 (i.e., 0-46% free meals) is the mean of group 1 minus group 2, and B2 (i.e., 47-80% free meals) is the mean of group 2 minus group 3. Both of these comparisons are significant, indicating that group 1 significantly differs from group 2, and group 2 significantly differs from group 3.

MEANS
  TABLES=api00 BY mealcat.
 
Case Processing Summary

Cases

Included Excluded Total

N Percent N Percent N Percent

api 2000 * Percentage free meals in 3 categories 400 100.0% 0 .0% 400 100.0%

Report
api 2000
Percentage free meals in 3 categories Mean N Std. Deviation

0-46% free meals 805.72 131 65.669

47-80% free meals 639.39 132 82.135

81-100% free meals 504.38 137 62.727

Total 647.62 400 142.249

**Case Processing Summary**
	Cases
Included	Excluded	Total
N	Percent	N	Percent	N	Percent
api 2000 * Percentage free meals in 3 categories	400	100.0%	0	.0%	400	100.0%

3.4 Regression with two categorical predictors

Previously we looked at using yr_rnd to predict api00

regression
 /dep api00
 /method = enter yr_rnd.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	year round school(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.475(a)	.226	.224	125.300
a Predictors: (Constant), year round school

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	1825000.563	1	1825000.563	116.241	.000(a)
	Residual	6248671.435	398	15700.179
	Total	8073671.997	399
a Predictors: (Constant), year round school
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	684.539	7.140		95.878	.000
1	year round school	-160.506	14.887	-.475	-10.782	.000
a Dependent Variable: api 2000

And we have also looked at mealcat using the regression command

regression
 /dep api00
 /method =  enter mealcat1 mealcat2.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT2, MEALCAT1(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.869(a)	.755	.754	70.612
a Predictors: (Constant), MEALCAT2, MEALCAT1

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6094197.670	2	3047098.835	611.121	.000(a)
	Residual	1979474.328	397	4986.081
	Total	8073671.997	399
a Predictors: (Constant), MEALCAT2, MEALCAT1
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	504.380	6.033		83.606	.000
	MEALCAT1	301.338	8.629	.995	34.922	.000
	MEALCAT2	135.014	8.612	.447	15.677	.000
a Dependent Variable: api 2000

We can include both yr_rnd and mealcat together in the same model.

regression
 /dep api00
 /method =  enter yr_rnd mealcat1 mealcat2.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT2, year round school, MEALCAT1(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.876(a)	.767	.765	68.893
a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6194144.303	3	2064714.768	435.017	.000(a)
	Residual	1879527.694	396	4746.282
	Total	8073671.997	399
a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	526.330	7.585		69.395	.000
	year round school	-42.960	9.362	-.127	-4.589	.000
	MEALCAT1	281.683	9.446	.930	29.821	.000
	MEALCAT2	117.946	9.189	.390	12.836	.000
a Dependent Variable: api 2000

We can test the overall effect of mealcat with the method=test() command, which is significant.

regression
 /dep api00
 /method =  enter yr_rnd
 /method = test(mealcat1 mealcat2).

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	year round school(a)	.	Enter
2	MEALCAT2, MEALCAT1	.	Test
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.475(a)	.226	.224	125.300
2	.876(b)	.767	.765	68.893
a Predictors: (Constant), year round school
b Predictors: (Constant), year round school, MEALCAT2, MEALCAT1

**ANOVA(d)**
Model			Sum of Squares	df	Mean Square	F	Sig.	R Square Change
1	Regression		1825000.563	1	1825000.563	116.241	.000(a)
	Residual		6248671.435	398	15700.179
	Total		8073671.997	399
2	Subset Tests	MEALCAT1, MEALCAT2	4369143.740	2	2184571.870	460.270	.000(b)	.541
	Regression		6194144.303	3	2064714.768	435.017	.000(c)
	Residual		1879527.694	396	4746.282
	Total		8073671.997	399
a Predictors: (Constant), year round school
b Tested against the full model.
c Predictors in the Full Model: (Constant), year round school, MEALCAT2, MEALCAT1.
d Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	684.539	7.140		95.878	.000
1	year round school	-160.506	14.887	-.475	-10.782	.000
2	(Constant)	526.330	7.585		69.395	.000
	year round school	-42.960	9.362	-.127	-4.589	.000
	MEALCAT1	281.683	9.446	.930	29.821	.000
	MEALCAT2	117.946	9.189	.390	12.836	.000
a Dependent Variable: api 2000

**Excluded Variables(b)**
		Beta In	t	Sig.	Partial Correlation	Collinearity Statistics
Model		Beta In	t	Sig.	Partial Correlation	Tolerance
1	MEALCAT1	.697(a)	23.132	.000	.758	.914
1	MEALCAT2	-.138(a)	-3.106	.002	-.154	.962
a Predictors in the Model: (Constant), year round school
b Dependent Variable: api 2000

Because this model has only main effects (no interactions) you can interpret Byr_rnd as the difference between the year round and non-year round group. The coefficient for mealcat1 (which we will call Bmealcat1) is the difference between mealcat=1 and mealcat=3, and Bmealcat2 as the difference between mealcat=2 and mealcat=3.

Let's dig below the surface and see how the coefficients relate to the predicted values. Let's view the cells formed by crossing yr_rnd and mealcat and number the cells from cell1 to cell6.

           mealcat=1     mealcat=2      mealcat=3
 yr_rnd=0  cell1         cell2          cell3
 yr_rnd=1  cell4         cell5          cell6

With respect to mealcat, the group mealcat=3 is the reference category, and with respect to yr_rnd the group yr_rnd=0 is the reference category. As a result, cell3 is the reference cell. The constant is the predicted value for this cell.

The coefficient for yr_rnd is the difference between cell3 and cell6. Since this model has only main effects, it is also the difference between cell2 and cell5, or from cell1 and cell4. In other words, Byr_rnd is the amount you add to the predicted value when you go from non-year round to year round schools.

The coefficient for _Imealcat_1 is the predicted difference between cell1 and cell3. Since this model only has main effects, it is also the predicted difference between cell4 and cell6. Likewise, B_Imealcat_2 is the predicted difference between cell2 and cell3, and also the predicted difference between cell5 and cell6.

So, the predicted values, in terms of the coefficients, would be

           mealcat=1         mealcat=2         mealcat=3
          -----------------------------------------------
 yr_rnd=0  intercept         intercept        intercept
           +BMealCat1       +BMealCat2
          -----------------------------------------------
 yr_rnd=1  intercept        intercept         intercept    
           +Byr_rnd          +Byr_rnd          +Byr_rnd 
           +BMealCat1       +BMealCat2

We should note that if you computed the predicted values for each cell, they would not exactly match the means in the 6 cells. The predicted means would be close to the observed means in the cells, but not exactly the same. This is because our model only has main effects and assumes that the difference between cell1 and cell4 is exactly the same as the difference between cells 2 and 5 which is the same as the difference between cells 3 and 5. Since the observed values don't follow this pattern, there is some discrepancy between the predicted means and observed means.

3.4.2 Using the glm command

We can run the same analysis using the glm command with just main effects. Because SPSS's default is to include all main effects and interactions in the model, to get just the main effects, you need to include the /design statement and specify just the main effects, as shown below.

glm
  api00 BY yr_rnd mealcat
  /DESIGN = yr_rnd mealcat
  /print=parameter TEST(LMATRIX).
Between-Subjects Factors

Value Label N

year round school 0 No 308

1 Yes 92

Percentage free meals in 3 categories 1 0-46% free meals 131

2 47-80% free meals 132

3 81-100% free meals 137

Tests of Between-Subjects Effects
Dependent Variable: api 2000
Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 6194144.303(a) 3 2064714.768 435.017 .000

Intercept 104733334.071 1 104733334.071 22066.395 .000

YR_RND 99946.633 1 99946.633 21.058 .000

MEALCAT 4369143.740 2 2184571.870 460.270 .000

Error 1879527.694 396 4746.282

Total 175839633.000 400

Corrected Total 8073671.997 399

a R Squared = .767 (Adjusted R Squared = .765)

Parameter Estimates
Dependent Variable: api 2000

B Std. Error t Sig. 95% Confidence Interval

Parameter Lower Bound Upper Bound

Intercept 483.370 7.457 64.821 .000 468.710 498.030

[YR_RND=0] 42.960 9.362 4.589 .000 24.555 61.365

[YR_RND=1] 0(a) . . . . .

[MEALCAT=1] 281.683 9.446 29.821 .000 263.113 300.253

[MEALCAT=2] 117.946 9.189 12.836 .000 99.881 136.011

[MEALCAT=3] 0(a) . . . . .

a This parameter is set to zero because it is redundant.

Intercept

Contrast

Parameter L1

Intercept 1.000

[YR_RND=0] .500

[YR_RND=1] .500

[MEALCAT=1] .333

[MEALCAT=2] .333

[MEALCAT=3] .333

The default display of this matrix is the transpose of the corresponding L matrix.
Based on Type III Sums of Squares.

YR_RND

Contrast

Parameter L2

Intercept 0

[YR_RND=0] 1

[YR_RND=1] -1

[MEALCAT=1] 0

[MEALCAT=2] 0

[MEALCAT=3] 0

The default display of this matrix is the transpose of the corresponding L matrix.
Based on Type III Sums of Squares.

MEALCAT

Contrast

Parameter L4 L5

Intercept 0 0

[YR_RND=0] 0 0

[YR_RND=1] 0 0

[MEALCAT=1] 1 0

[MEALCAT=2] 0 1

[MEALCAT=3] -1 -1

The default display of this matrix is the transpose of the corresponding L matrix.
Based on Type III Sums of Squares.

In summary, these results indicate the differences between year round and non-year round students is significant, and the differences among the three mealcat groups are significant.

3.5 Categorical predictor with interactions

3.5.1 Manually coding an interaction

Let's perform the same analysis that we performed above. This time let's include the interaction of mealcat by yr_rnd.

compute yrmeal1 = mealcat1*yr_rnd.
compute yrmeal2 = mealcat2*yr_rnd.
execute.

regression
 /dep api00
 /method = enter yr_rnd mealcat1 mealcat2 yrmeal1 yrmeal2.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	YRMEAL2, YRMEAL1, MEALCAT1, year round school, MEALCAT2(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.877(a)	.769	.766	68.873
a Predictors: (Constant), YRMEAL2, YRMEAL1, MEALCAT1, year round school, MEALCAT2

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6204727.822	5	1240945.564	261.609	.000(a)
	Residual	1868944.176	394	4743.513
	Total	8073671.997	399
a Predictors: (Constant), YRMEAL2, YRMEAL1, MEALCAT1, year round school, MEALCAT2
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	521.493	8.414		61.978	.000
	year round school	-33.493	11.771	-.099	-2.845	.005
	MEALCAT1	288.193	10.443	.952	27.597	.000
	MEALCAT2	123.781	10.552	.410	11.731	.000
	YRMEAL1	-40.764	29.231	-.038	-1.395	.164
	YRMEAL2	-18.248	22.256	-.024	-.820	.413
a Dependent Variable: api 2000

We can test the overall interaction with the test command. This interaction effect is not significant, with an F of 1.116 and a p value of .329.

regression
 /dep api00
 /method = enter yr_rnd mealcat1 mealcat2
 /method = test(yrmeal1 yrmeal2).

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	MEALCAT2, year round school, MEALCAT1(a)	.	Enter
2	YRMEAL1, YRMEAL2	.	Test
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.876(a)	.767	.765	68.893
2	.877(b)	.769	.766	68.873
a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1
b Predictors: (Constant), MEALCAT2, year round school, MEALCAT1, YRMEAL1, YRMEAL2

**ANOVA(d)**
Model			Sum of Squares	df	Mean Square	F	Sig.	R Square Change
1	Regression		6194144.303	3	2064714.768	435.017	.000(a)
	Residual		1879527.694	396	4746.282
	Total		8073671.997	399
2	Subset Tests	YRMEAL1, YRMEAL2	10583.519	2	5291.759	1.116	.329(b)	.001
	Regression		6204727.822	5	1240945.564	261.609	.000(c)
	Residual		1868944.176	394	4743.513
	Total		8073671.997	399
a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1
b Tested against the full model.
c Predictors in the Full Model: (Constant), MEALCAT2, year round school, MEALCAT1, YRMEAL1, YRMEAL2.
d Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	526.330	7.585		69.395	.000
	year round school	-42.960	9.362	-.127	-4.589	.000
	MEALCAT1	281.683	9.446	.930	29.821	.000
	MEALCAT2	117.946	9.189	.390	12.836	.000
2	(Constant)	521.493	8.414		61.978	.000
	year round school	-33.493	11.771	-.099	-2.845	.005
	MEALCAT1	288.193	10.443	.952	27.597	.000
	MEALCAT2	123.781	10.552	.410	11.731	.000
	YRMEAL1	-40.764	29.231	-.038	-1.395	.164
	YRMEAL2	-18.248	22.256	-.024	-.820	.413
a Dependent Variable: api 2000

**Excluded Variables(b)**
		Beta In	t	Sig.	Partial Correlation	Collinearity Statistics
Model		Beta In	t	Sig.	Partial Correlation	Tolerance
1	YRMEAL1	-.033(a)	-1.249	.212	-.063	.846
1	YRMEAL2	-.016(a)	-.535	.593	-.027	.695
a Predictors in the Model: (Constant), MEALCAT2, year round school, MEALCAT1
b Dependent Variable: api 2000

It is important to note how the meaning of the coefficients change in the presence of these interaction terms. For example, in the prior model, with only main effects, we could interpret Byr_rnd as the difference between the year-round and non- year-round students. However, now that we have added the interaction term, the term Byr_rnd represents the difference between cell3 and cell6, or the difference between the year- round and non-year round students when mealcat=3 (because mealcat=3 was the omitted group). The presence of an interaction would imply that the difference between year round and non-year-round students depends on the level of mealcat. The interaction terms Byrmeal1 and Byrmeal2 represent the extent to which the difference between the year-round/non- year- round students changes when mealcat=1 and when mealcat=2 (as compared to the reference group, mealcat=3). For example, the term Byrmeal1 represents the difference between year- round and non-year- round for mealcat=1 vs. the difference for mealcat=3. In other words, Byrmeal1 in this design is (cell1-cell4) - (cell3-cell6), or it represents how much the effect of yr_rnd differs between mealcat=1 and mealcat=3.

Below we have shown the predicted values for the six cells in terms of the coefficients in the model. If you compare this to the main effects model, you will see that the predicted values are the same except for the addition of yrmeal1 (in cell 4) and yrmeal2 (in cell 5).

           mealcat=1           mealcat=2         mealcat=3
           -------------------------------------------------
 yr_rnd=0  intercept           intercept        intercept    
           +BMealCat1         +BMealCat2 
           -------------------------------------------------
 yr_rnd=1  intercept          intercept         intercept    
           +Byr_rnd            +Byr_rnd          +Byr_rnd
           +BMealCat1         +BMealCat2           
           +YrMeal1            +YrMeal2

It can be very tricky to interpret these interaction terms if you wish to form specific comparisons. For example, if you wanted to perform a test of the simple main effect of yr_rnd when mealcat=1, i.e., comparing compare cell1 with cell4, you would want to compare intercept+ BMealCat1 vs. intercept + Byr_rnd + BMealCat1 + BYrMeal1. As we will see, such tests can be more easily done via glm.

3.5.2 Using glm

Constructing these interactions can be somewhat easier when using the glm command. As you see below, the glm command gives us the test of the overall main effects and interactions without the need to perform subsequent test statement. The /emmeans statement tells SPSS to compare each level of the categorical variable with every other level of that variable.

glm
  api00 by yr_rnd mealcat
  /EMMEANS TABLES(yr_rnd*mealcat) COMPARE(yr_rnd).

**Between-Subjects Factors**
		Value Label	N
year round school	0	No	308
year round school	1	Yes	92
Percentage free meals in 3 categories	1	0-46% free meals	131
	2	47-80% free meals	132
	3	81-100% free meals	137

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	6204727.822(a)	5	1240945.564	261.609	.000
Intercept	56354756.653	1	56354756.653	11880.384	.000
YR_RND	99617.371	1	99617.371	21.001	.000
MEALCAT	1796232.798	2	898116.399	189.336	.000
YR_RND * MEALCAT	10583.519	2	5291.759	1.116	.329
Error	1868944.176	394	4743.513
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .769 (Adjusted R Squared = .766)

**Estimates**
Dependent Variable: api 2000
		Mean	Std. Error	95% Confidence Interval
year round school	Percentage free meals in 3 categories	Mean	Std. Error	Lower Bound	Upper Bound
No	0-46% free meals	809.685	6.185	797.526	821.845
	47-80% free meals	645.274	6.367	632.755	657.792
	81-100% free meals	521.493	8.414	504.950	538.035
Yes	0-46% free meals	735.429	26.032	684.250	786.607
	47-80% free meals	593.533	17.783	558.572	628.495
	81-100% free meals	488.000	8.232	471.816	504.184

**Pairwise Comparisons**
Dependent Variable: api 2000
			Mean Difference (I-J)	Std. Error	Sig.(a)	95% Confidence Interval for Difference(a)
Percentage free meals in 3 categories	(I) year round school	(J) year round school	Mean Difference (I-J)	Std. Error	Sig.(a)	Lower Bound	Upper Bound
0-46% free meals	No	Yes	74.257(*)	26.756	.006	21.654	126.860
0-46% free meals	Yes	No	-74.257(*)	26.756	.006	-126.860	-21.654
47-80% free meals	No	Yes	51.740(*)	18.889	.006	14.605	88.875
47-80% free meals	Yes	No	-51.740(*)	18.889	.006	-88.875	-14.605
81-100% free meals	No	Yes	33.493(*)	11.771	.005	10.350	56.635
81-100% free meals	Yes	No	-33.493(*)	11.771	.005	-56.635	-10.350
Based on estimated marginal means
* The mean difference is significant at the .050 level.
a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

**Univariate Tests**
Dependent Variable: api 2000
Percentage free meals in 3 categories		Sum of Squares	df	Mean Square	F	Sig.
0-46% free meals	Contrast	36536.101	1	36536.101	7.702	.006
0-46% free meals	Error	1868944.176	394	4743.513
47-80% free meals	Contrast	35592.534	1	35592.534	7.503	.006
47-80% free meals	Error	1868944.176	394	4743.513
81-100% free meals	Contrast	38401.517	1	38401.517	8.096	.005
81-100% free meals	Error	1868944.176	394	4743.513
Each F tests the simple effects of year round school within each level combination of the other effects shown. These tests are based on the linearly independent pairwise comparisons among the estimated marginal means.

Although this section has focused on how to handle analyses involving interactions, these particular results show no indication of interaction. We could decide to omit interaction terms from future analyses having found the interactions to be non-significant. This would simplify future analyses, however including the interaction term can be useful to assure readers that the interaction term is non-significant.

3.6 Continuous and Categorical variables

3.6.1 Using regress

Say that we wish to analyze both continuous and categorical variables in one analysis. For example, let's include yr_rnd and some_col in the same analysis. We will save the predicted values for use in just a moment.

regress
 /dep = api00
 /method = enter yr_rnd some_col
 /save pre.
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 parent some college, year round school(a) . Enter

a All requested variables entered.
b Dependent Variable: api 2000

Model Summary(b)
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .507(a) .257 .253 122.951

a Predictors: (Constant), parent some college, year round school
b Dependent Variable: api 2000

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.

1 Regression 2072201.839 2 1036100.919 68.539 .000(a)

Residual 6001470.159 397 15117.053

Total 8073671.997 399

a Predictors: (Constant), parent some college, year round school
b Dependent Variable: api 2000

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 637.858 13.503
47.237 .000

year round school -149.159 14.875 -.442 -10.027 .000

parent some college 2.236 .553 .178 4.044 .000

a Dependent Variable: api 2000

Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N

Predicted Value 488.70 787.65 647.62 72.066 400

Residual -276.04 293.20 .00 122.643 400

Std. Predicted Value -2.205 1.943 .000 1.000 400

Std. Residual -2.245 2.385 .000 .997 400

a Dependent Variable: api 2000

Let's graph the predicted values by some_col.

GRAPH
  /SCATTERPLOT(BIVAR)=some_col WITH pre_1.

The coefficient for some_col indicates that for every unit increase in some_col the api00 score is predicted to increase by 2.23 units. This is the slope of the lines shown in the above graph. The graph has two lines, one for the year round students and one for the non-year round students. The coefficient for yr_rnd is -149.16, indicating that as yr_rnd increases by 1 unit, the api00 score is expected to decrease by about 149 units. As you can see in the graph, the top line is about 150 units higher than the lower line. You can see that the intercept is 637 and that is where the upper line crosses the Y axis when X is 0. The lower line crosses the line about 150 units lower at about 487.

3.6.2 Using glm

We can run this analysis using the glm command. The glm command assumes that the variables are categorical; thus, we need to enter some_col as a covariate to specify that some_col is a continuous variable.

glm
 api00 by yr_rnd with some_col.

**Between-Subjects Factors**
		Value Label	N
year round school	0	No	308
year round school	1	Yes	92

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	2072201.839(a)	2	1036100.919	68.539	.000
Intercept	30709901.014	1	30709901.014	2031.474	.000
SOME_COL	247201.276	1	247201.276	16.352	.000
YR_RND	1519992.669	1	1519992.669	100.548	.000
Error	6001470.159	397	15117.053
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .257 (Adjusted R Squared = .253)

If we square the t-values from the regress command (above), we would find that they match those of the glm command.

3.7 Interactions of Continuous by 0/1 Categorical variables

Above we showed an analysis that looked at the relationship between some_col and api00 and also included yr_rnd. We saw that this produced a graph where we saw the relationship between some_col and api00 but there were two regression lines, one higher than the other but with equal slopes. Such a model assumed that the slope was the same for the two groups. Perhaps the slope might be different for these groups. Let's run the regressions separately for these two groups beginning with the non-year-round schools.

COMPUTE filt=(yr_rnd=0).
FILTER BY filt.
regress
 /dep = api00
 /method = enter some_col.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	parent some college(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.126(a)	.016	.013	131.278
a Predictors: (Constant), parent some college

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	84700.858	1	84700.858	4.915	.027(a)
	Residual	5273591.675	306	17233.960
	Total	5358292.532	307
a Predictors: (Constant), parent some college
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	655.110	15.237		42.995	.000
1	parent some college	1.409	.636	.126	2.217	.027
a Dependent Variable: api 2000

GGRAPH
  /GRAPHDATASET NAME="GraphDataset" VARIABLES= api00 some_col 
  /GRAPHSPEC SOURCE=INLINE .
BEGIN GPL
SOURCE: s=userSource( id( "GraphDataset" ) )
DATA: api00=col( source(s), name( "api00" ) )
DATA: some_col=col( source(s), name( "some_col" ) )
GUIDE: axis( dim( 1 ), label( "some_col" ) )
GUIDE: axis( dim( 2 ), label( "api00" ) )
ELEMENT: point( position( some_col * api00 ) )
ELEMENT: line( position(smooth.linear( some_col * api00 ) ) )
END GPL.
COMMENT -- End GGRAPH command.
filter off.

Likewise, let's look at the year-round schools.

COMPUTE filt=(yr_rnd=1).
FILTER BY filt.
regress
 /dep = api00
 /method = enter some_col.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	parent some college(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.648(a)	.420	.413	75.773
a Predictors: (Constant), parent some college

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	373644.064	1	373644.064	65.078	.000(a)
	Residual	516734.838	90	5741.498
	Total	890378.902	91
a Predictors: (Constant), parent some college
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	407.039	16.515		24.647	.000
1	parent some college	7.403	.918	.648	8.067	.000
a Dependent Variable: api 2000


GGRAPH
  /GRAPHDATASET NAME="GraphDataset" VARIABLES= api00 some_col 
  /GRAPHSPEC SOURCE=INLINE .
BEGIN GPL
SOURCE: s=userSource( id( "GraphDataset" ) )
DATA: api00=col( source(s), name( "api00" ) )
DATA: some_col=col( source(s), name( "some_col" ) )
GUIDE: axis( dim( 1 ), label( "some_col" ) )
GUIDE: axis( dim( 2 ), label( "api00" ) )
ELEMENT: point( position( some_col * api00 ) )
ELEMENT: line( position(smooth.linear( some_col * api00 ) ) )
END GPL.
filter off.

Note that the slope of the regression line looks much steeper for the year-round schools than for the non-year-round schools. This is confirmed by the regression equations that show the slope for the year round schools to be higher (7.4) than non-year round schools (1.3). We can compare these to see if these are significantly different from each other by including the interaction of some_col by yr_rnd, an interaction of a continuous variable by a categorical variable.

3.7.1 Computing interactions manually

We will start by manually computing the interaction of some_col by yr_rnd. Let's start fresh and reload the elemapi2 data file to clear out any variables we had previously created.

GET FILE='C:spssregelemapi2.sav'.

Next, let's make a variable that is the interaction of some college (some_col) and year-round schools (yr_rnd) called yrXsome.

compute yrXsome = yr_rnd*some_col.
execute.

We can now run the regression that tests whether the coefficient for some_col is significantly different for year round schools and non-year- round schools. Indeed, the yrXsome interaction effect is significant. We can make a graph showing the regression lines for the two types of schools showing how different their regression lines are, so we will save the predicted values.

regress
 /dep = api00
 /method = enter some_col yr_rnd yrXsome
 /save pre.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	YRXSOME, parent some college, year round school(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary(b)**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.532(a)	.283	.277	120.922
a Predictors: (Constant), YRXSOME, parent some college, year round school
b Dependent Variable: api 2000

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	2283345.485	3	761115.162	52.053	.000(a)
	Residual	5790326.513	396	14622.037
	Total	8073671.997	399
a Predictors: (Constant), YRXSOME, parent some college, year round school
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	655.110	14.035		46.677	.000
	parent some college	1.409	.586	.112	2.407	.017
	year round school	-248.071	29.859	-.735	-8.308	.000
	YRXSOME	5.993	1.577	.330	3.800	.000
a Dependent Variable: api 2000

**Residuals Statistics(a)**
	Minimum	Maximum	Mean	Std. Deviation	N
Predicted Value	407.04	749.54	647.62	75.648	400
Residual	-275.12	279.25	.00	120.466	400
Std. Predicted Value	-3.180	1.347	.000	1.000	400
Std. Residual	-2.275	2.309	.000	.996	400
a Dependent Variable: api 2000

We can graph the predicted values for the two types of schools by some_col. You can see how the two lines have quite different slopes, consistent with the fact that the yrXsome interaction was significant.

GRAPH
  /SCATTERPLOT(BIVAR)=some_col WITH pre_1 BY yr_rnd.

We can replot the same graph including the data points. You will need to double-click on the graph that is produced by the code below to add the regression lines to the graph.

GRAPH
  /SCATTERPLOT(BIVAR)=some_col WITH api00 BY yr_rnd.

Let's quickly run the regressions again where we performed separate regressions for the two groups.

Non-year-round

COMPUTE filt=(yr_rnd=0).
FILTER BY filt.
regress
 /dep = api00
 /method = enter some_col.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	parent some college(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.126(a)	.016	.013	131.278
a Predictors: (Constant), parent some college

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	84700.858	1	84700.858	4.915	.027(a)
	Residual	5273591.675	306	17233.960
	Total	5358292.532	307
a Predictors: (Constant), parent some college
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	655.110	15.237		42.995	.000
1	parent some college	1.409	.636	.126	2.217	.027
a Dependent Variable: api 2000

Year-round

COMPUTE filt=(yr_rnd=1).
FILTER BY filt.
regress
 /dep = api00
 /method = enter some_col.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	parent some college(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.648(a)	.420	.413	75.773
a Predictors: (Constant), parent some college

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	373644.064	1	373644.064	65.078	.000(a)
	Residual	516734.838	90	5741.498
	Total	890378.902	91
a Predictors: (Constant), parent some college
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	407.039	16.515		24.647	.000
1	parent some college	7.403	.918	.648	8.067	.000
a Dependent Variable: api 2000

Now, let's show the regression for both types of schools with the interaction term.

filter off.
regress
 /dep = api00
 /method = enter some_col yr_rnd yrXsome
 /save pre.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	YRXSOME, parent some college, year round school(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary(b)**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.532(a)	.283	.277	120.922
a Predictors: (Constant), YRXSOME, parent some college, year round school
b Dependent Variable: api 2000

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	2283345.485	3	761115.162	52.053	.000(a)
	Residual	5790326.513	396	14622.037
	Total	8073671.997	399
a Predictors: (Constant), YRXSOME, parent some college, year round school
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	655.110	14.035		46.677	.000
	parent some college	1.409	.586	.112	2.407	.017
	year round school	-248.071	29.859	-.735	-8.308	.000
	YRXSOME	5.993	1.577	.330	3.800	.000
a Dependent Variable: api 2000

**Residuals Statistics(a)**
	Minimum	Maximum	Mean	Std. Deviation	N
Predicted Value	407.04	749.54	647.62	75.648	400
Residual	-275.12	279.25	.00	120.466	400
Std. Predicted Value	-3.180	1.347	.000	1.000	400
Std. Residual	-2.275	2.309	.000	.996	400
a Dependent Variable: api 2000

Note that the coefficient for some_col in the combined analysis is the same as the coefficient for some_col for the non-year-round schools. This is because non-year-round schools are the reference group. Then, the coefficient for the yrXsome interaction in the combined analysis is the Bsome_col for the year round schools (7.4) minus Bsome_col for the non year round schools (1.41), yielding 5.99. This interaction is the difference in the slopes of some_col for the two types of schools, and this is why this is useful for testing whether the regression lines for the two types of schools are equal. If the two types of schools had the same regression coefficient for some_col, then the coefficient for the yrXsome interaction would be 0. In this case, the difference is significant, indicating that the regression lines are significantly different.

So, if we look at the graph of the two regression lines we can see the difference in the slopes of the regression lines (see graph below). Indeed, we can see that the non-year round schools (the solid line) have a smaller slope (1.4) than the slope for the year round schools (7.4). The difference between these slopes is 5.99, the coefficient for yrXsome.

GRAPH
  /SCATTERPLOT(BIVAR)=some_col WITH pre_1 BY yr_rnd.

3.7.2 Computing interactions with glm

We can also run a model just like the model we showed above using the glm command. We include the terms yr_rnd some_col and the interaction yr_rnr*some_col .

glm
  api00 BY yr_rnd WITH some_col
  /DESIGN = some_col yr_rnd yr_rnd*some_col.

**Between-Subjects Factors**
		Value Label	N
year round school	0	No	308
year round school	1	Yes	92

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	2283345.485(a)	3	761115.162	52.053	.000
Intercept	18502483.537	1	18502483.537	1265.383	.000
SOME_COL	456473.187	1	456473.187	31.218	.000
YR_RND	1009279.986	1	1009279.986	69.025	.000
YR_RND * SOME_COL	211143.646	1	211143.646	14.440	.000
Error	5790326.513	396	14622.037
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .283 (Adjusted R Squared = .277)

As we illustrated above, we can compute the predicted values using the predict command and graph the separate regression lines. These commands are omitted.

In this section we found that the relationship between some_col and api00 depended on whether the student was from a year-round school or from a non-year-round school. For the students from year- round schools, the relationship between some_col and api00 was significantly stronger than for those from non-year- round schools. In general, this type of analysis allows you to test whether the strength of the relationship between two continuous variables varies based on the categorical variable.

3.8 Continuous and Categorical variables, interaction with 1/2/3 variable

The prior examples showed how to do regressions with a continuous variable and a categorical variable that has 2 levels. These examples will extend this further by using a categorical variable with 3 levels, mealcat.

3.8.1 using regress

We can run a model with some_col mealcat and the interaction of these two variables.

GET FILE='C:spssregelemapi2.sav'.

if mealcat ~= missing(mealcat) mealcat1 = 0.
if mealcat = 1 mealcat1 = 1.
if mealcat ~= missing(mealcat) mealcat2 = 0.
if mealcat = 2 mealcat2 = 1.
if mealcat ~= missing(mealcat) mealcat3 = 0.
if mealcat = 3 mealcat3 = 1.
compute smc1 = mealcat1*some_col.
compute smc2 = mealcat2*some_col.
compute smc3 = mealcat3*some_col.
execute.

regress
 /dep = api00
 /method = enter mealcat2 mealcat3 some_col
 /method = test (smc2 smc3)
 /save pre.
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 parent some college, MEALCAT2, MEALCAT3(a) . Enter

2 SMC3, SMC2 . Test

a All requested variables entered.
b Dependent Variable: api 2000

Model Summary(c)
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .870(a) .757 .756 70.332

2 .877(b) .769 .767 68.733

a Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3
b Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3, SMC3, SMC2
c Dependent Variable: api 2000

ANOVA(d)
Model Sum of Squares df Mean Square F Sig. R Square Change

1 Regression 6114838.708 3 2038279.569 412.061 .000(a)

Residual 1958833.290 396 4946.549

Total 8073671.997 399

2 Subset Tests SMC2, SMC3 97468.169 2 48734.084 10.316 .000(b) .012

Regression 6212306.876 5 1242461.375 262.995 .000(c)

Residual 1861365.121 394 4724.277

Total 8073671.997 399

a Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3
b Tested against the full model.
c Predictors in the Full Model: (Constant), parent some college, MEALCAT2, MEALCAT3, SMC3, SMC2.
d Dependent Variable: api 2000

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 791.179 9.403
84.143 .000

MEALCAT2 -168.132 8.719 -.556 -19.284 .000

MEALCAT3 -296.436 8.923 -.990 -33.221 .000

parent some college .683 .334 .054 2.043 .042

2 (Constant) 825.894 11.992
68.871 .000

MEALCAT2 -239.030 18.665 -.791 -12.806 .000

MEALCAT3 -344.948 17.057 -1.152 -20.223 .000

parent some college -.947 .487 -.076 -1.944 .053

SMC2 3.141 .729 .286 4.307 .000

SMC3 2.607 .896 .149 2.910 .004

a Dependent Variable: api 2000

Excluded Variables(b)

Beta In t Sig. Partial Correlation Collinearity Statistics

Model Tolerance

1 SMC2 .215(a) 3.455 .001 .171 .153

SMC3 .069(a) 1.412 .159 .071 .258

a Predictors in the Model: (Constant), parent some college, MEALCAT2, MEALCAT3
b Dependent Variable: api 2000

Casewise Diagnostics(a)
Case Number Std. Residual api 2000

226 -3.593 386

a Dependent Variable: api 2000

Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N

Predicted Value 480.95 825.89 647.62 124.779 400

Residual -246.93 201.23 .00 68.301 400

Std. Predicted Value -1.336 1.429 .000 1.000 400

Std. Residual -3.593 2.928 .000 .994 400

a Dependent Variable: api 2000

These results indicate that the overall interaction is indeed significant. This means that the regression lines from the three groups differ significantly. As we have done before, let's the predicted values so we can see how the regression lines differ.

Because we had three groups, we get three regression lines, one for each category of mealcat.

GRAPH
  /LINE(MULTIPLE)MEAN(pre_1) BY some_col BY mealcat.

Group 1 was the omitted group, therefore the slope of the line for group 1 is the coefficient for some_col which is -.94. Indeed, this line has a downward slope. If we add the coefficient for some_col to the coefficient for smc2 we get the coefficient for group 2, i.e., 3.14 + -.94 yields 2.2, the slope for group 2. Indeed, group 2 shows an upward slope. Likewise, if we add the coefficient for some_col to the coefficient for smc3 we get the coefficient for group 3, i.e., 2.6 + -.94 yields 1.66, the slope for group 3. So, the slopes for the 3 groups are

group 1: -0.94
group 2:  2.2
group 3:  1.66

The test of the coefficient for smc2 tested whether the coefficient for group 2 differed from group 1, and indeed this was significant. Likewise, the test of the coefficient for smc3 tested whether the coefficient for group 3 differed from group 1, and indeed this was significant. What did the test of the coefficient some_col test? This coefficient represents the coefficient for group 1, so this tested whether the coefficient for group 1 (-0.94) was significantly different from 0. This is probably a non-interesting test.

The comparisons in the above analyses don't seem to be as interesting as comparing group 1 vs. 2 and then comparing group 2 vs. 3. These successive comparisons seem much more interesting. We can do this by making group 2 the omitted group, and then each group would be compared to group 2.

regress
 /dep = api00
 /method = enter mealcat1 mealcat3 some_col smc1 smc3.

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1(a)	.	Enter
a All requested variables entered.
b Dependent Variable: api 2000

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.877(a)	.769	.767	68.733
a Predictors: (Constant), SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1

**ANOVA(b)**
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6212306.876	5	1242461.375	262.995	.000(a)
	Residual	1861365.121	394	4724.277
	Total	8073671.997	399
a Predictors: (Constant), SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1
b Dependent Variable: api 2000

**Coefficients(a)**
		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	586.864	14.303		41.030	.000
	MEALCAT1	239.030	18.665	.790	12.806	.000
	MEALCAT3	-105.918	18.754	-.354	-5.648	.000
	parent some college	2.194	.543	.175	4.043	.000
	SMC1	-3.141	.729	-.270	-4.307	.000
	SMC3	-.534	.927	-.030	-.576	.565
a Dependent Variable: api 2000

Now, the test of smc1 tests whether the coefficient for group 1 differs from group 2, and it does. Then, the test of smc3 tests whether the coefficient for group 3 significantly differs from group 2, and it does not. This makes sense given the graph and given the estimates of the coefficients that we have, that -.94 is significantly different from 2.2 but 2.2 is not significantly different from 1.66.

3.8.2 Using glm

We can perform the same analysis using the glm command, as shown below. The glm command gives us somewhat less flexibility since we cannot choose which group is the omitted group.

GET FILE='C:spssregelemapi2.sav'.
glm
  api00 by mealcat with some_col
  /design = some_col mealcat some_col*mealcat
  /print = parameter TEST(LMATRIX).

**Between-Subjects Factors**
		Value Label	N
Percentage free meals in 3 categories	1	0-46% free meals	131
	2	47-80% free meals	132
	3	81-100% free meals	137

**Tests of Between-Subjects Effects**
Dependent Variable: api 2000
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	6212306.876(a)	5	1242461.375	262.995	.000
Intercept	34188885.021	1	34188885.021	7236.850	.000
SOME_COL	36366.366	1	36366.366	7.698	.006
MEALCAT	2012065.492	2	1006032.746	212.950	.000
MEALCAT * SOME_COL	97468.169	2	48734.084	10.316	.000
Error	1861365.121	394	4724.277
Total	175839633.000	400
Corrected Total	8073671.997	399
a R Squared = .769 (Adjusted R Squared = .767)

**Parameter Estimates**
Dependent Variable: api 2000
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	B	Std. Error	t	Sig.	Lower Bound	Upper Bound
Intercept	480.946	12.131	39.647	.000	457.097	504.795
SOME_COL	1.660	.752	2.208	.028	.182	3.138
[MEALCAT=1]	344.948	17.057	20.223	.000	311.413	378.483
[MEALCAT=2]	105.918	18.754	5.648	.000	69.046	142.789
[MEALCAT=3]	0(a)	.	.	.	.	.
[MEALCAT=1] * SOME_COL	-2.607	.896	-2.910	.004	-4.369	-.846
[MEALCAT=2] * SOME_COL	.534	.927	.576	.565	-1.289	2.357
[MEALCAT=3] * SOME_COL	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

Because the glm command omits the third category, and the analysis we showed above omitted the second category, the parameter estimates will not be the same. You can compare the results from below with the results above and see that the parameter estimates are not the same. Because group 3 is dropped, that is the reference category and all comparisons are made with group 3.

These analyses showed that the relationship between some_col and api00 varied, depending on the level of mealcat. In comparing group 1 with group 2, the coefficient for some_col was significantly different, but there was no difference in the coefficient for some_col in comparing groups 2 and 3.

3.9 Summary

This chapter covered four techniques for analyzing data with categorical variables, 1) manually constructing indicator variables, 2) using a do-loop, 3) using the regress command, and 4) using the glm command. Each method has its advantages and disadvantages, as described below.

Manually constructing indicator variables can be very tedious and even error prone. For very simple models, it is not very difficult to create your own indicator variables, but if you have categorical variables with many levels and/or interactions of categorical variables, it can be laborious to manually create indicator variables. However, the advantage is that you can have quite a bit of control over how the variables are created and the terms that are entered into the model.

A do-loop will allow you to create many indicator variables very quickly. However, there are some restrictions regarding the naming of the variables and all of the variables must have the same two values (i.e., zero and one or one and two).

The regress command is useful when you want to test one or a group of variables together.

The glm command is useful for those times when you want to use a particular coding scheme to perform certain types of tests, such as comparing each level of a variable to the previous level. The glm command in SPSS will create the appropriate codes for the variables and display the coding scheme in the output.

3.10 For more information

See the following web pages for more information and resources on regression with categorical predictors in SPSS.