How can I compare regression coefficients between two groups?

Sometimes your research may predict that the size of a regression coefficient should be bigger for one group than for another. For example, you might believe that the regression coefficient of height predicting weight would be higher for men than for women. Below, we have a data file with 10 fictional females and 10 fictional males, along with their height in inches and their weight in pounds.

data list free
 / id * gender (A8) height * weight.
begin data.
 1   F  56 117
 2   F  60 125
 3   F  64 133
 4   F  68 141
 5   F  72 149
 6   F  54 109
 7   F  62 128
 8   F  65 131
 9   F  65 131
10   F  70 145
11   M  64 211
12   M  68 223
13   M  72 235
14   M  76 247
15   M  80 259
16   M  62 201
17   M  69 228
18   M  74 245
19   M  75 241
20   M  82 269
end data.
execute.

We analyzed their data separately using the regression commands below. Note that we have to do two regressions, one with the data for females only and one with the data for males only. We use a filter to separate the data into these two groups. The parameter estimates (coefficients) for females and males are shown below, and the results do seem to suggest that for each additional inch of height there is a larger increase in weight for males (3.18) than for females (2.09).

COMPUTE filter_$=(gender="M").
FILTER BY filter_$.
regression
 /dep weight
 /method = enter height.
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 HEIGHT(a) . Enter

a All requested variables entered.
b Dependent Variable: WEIGH

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .994(a) .988 .987 2.40738

a Predictors: (Constant), HEIGHT

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.

1 Regression 3882.536 1 3882.536 669.926 .000(a)

Residual 46.364 8 5.795

Total 3928.900 9

a Predictors: (Constant), HEIGHT
b Dependent Variable: WEIGHT

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 5.602 8.930
.627 .548

HEIGHT 3.190 .123 .994 25.883 .000

a Dependent Variable: WEIGHT
COMPUTE filter_$=(gender="F").
FILTER BY filter_$.

regression
 /dep weight
 /method = enter height.
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 HEIGHT(a) . Enter

a All requested variables entered.
b Dependent Variable: WEIGHT

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .989(a) .978 .976 1.91504

a Predictors: (Constant), HEIGHT

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.

1 Regression 1319.561 1 1319.561 359.812 .000(a)

Residual 29.339 8 3.667

Total 1348.900 9

a Predictors: (Constant), HEIGHT
b Dependent Variable: WEIGHT

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) -2.397 7.053
-.340 .743

HEIGHT 2.096 .110 .989 18.969 .000

a Dependent Variable: WEIGHT

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	HEIGHT(a)	.	Enter
a All requested variables entered.
b Dependent Variable: WEIGH

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.994(a)	.988	.987	2.40738
a Predictors: (Constant), HEIGHT

**ANOVA(b)**
Model	Sum of Squares	df	Mean Square	F	Sig.
1	Regression	3882.536	1	3882.536	669.926	.000(a)
Residual	46.364	8	5.795
Total	3928.900	9
a Predictors: (Constant), HEIGHT
b Dependent Variable: WEIGHT

**Coefficients(a)**
	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
Model	B	Std. Error	Beta
1	(Constant)	5.602	8.930		.627	.548
HEIGHT	3.190	.123	.994	25.883	.000
a Dependent Variable: WEIGHT

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	HEIGHT(a)	.	Enter
a All requested variables entered.
b Dependent Variable: WEIGHT

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.989(a)	.978	.976	1.91504
a Predictors: (Constant), HEIGHT

**ANOVA(b)**
Model	Sum of Squares	df	Mean Square	F	Sig.
1	Regression	1319.561	1	1319.561	359.812	.000(a)
Residual	29.339	8	3.667
Total	1348.900	9
a Predictors: (Constant), HEIGHT
b Dependent Variable: WEIGHT

**Coefficients(a)**
	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
Model	B	Std. Error	Beta
1	(Constant)	-2.397	7.053		-.340	.743
HEIGHT	2.096	.110	.989	18.969	.000
a Dependent Variable: WEIGHT

We can compare the regression coefficients of males with females to test the null hypothesis H₀: b_f = b_m, where b_f is the regression coefficient for females, and b_m is the regression coefficient for males. Another way to write this null hypothesis is H₀: b_m – b_m = 0 . To do this analysis, we first make a dummy variable called female that is coded 1 for female and 0 for male, and a variable femht that is the product of female and height (this means that for males, femht is always equal to zero, and for females, it is equal to their height). We then use female, height and femht as predictors in the regression equation. In this sort of analysis male is said to be the omitted category, because we are modeling the effect of being female, however, males still remain in the model.

filter off.
execute.

compute female = 0.
if gender = "F" female = 1.
compute femht = female*height.
execute.

regression
 /dep weight
 /method = enter female height femht.

The output is shown below.

Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method

1 FEMHT, HEIGHT, FEMALE(a) . Enter

a All requested variables entered.
b Dependent Variable: WEIGHT

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1 .999(a) .999 .999 2.17518

a Predictors: (Constant), FEMHT, HEIGHT, FEMALE

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.

1 Regression 60327.097 3 20109.032 4250.111 .000(a)

Residual 75.703 16 4.731

Total 60402.800 19

a Predictors: (Constant), FEMHT, HEIGHT, FEMALE
b Dependent Variable: WEIGHT

Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.

Model B Std. Error Beta

1 (Constant) 5.602 8.069
.694 .497

FEMALE -7.999 11.371 -.073 -.703 .492

HEIGHT 3.190 .111 .421 28.646 .000

FEMHT -1.094 .168 -.638 -6.520 .000

a Dependent Variable: WEIGHT

**Variables Entered/Removed(b)**
Model	Variables Entered	Variables Removed	Method
1	FEMHT, HEIGHT, FEMALE(a)	.	Enter
a All requested variables entered.
b Dependent Variable: WEIGHT

**Model Summary**
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.999(a)	.999	.999	2.17518
a Predictors: (Constant), FEMHT, HEIGHT, FEMALE

**ANOVA(b)**
Model	Sum of Squares	df	Mean Square	F	Sig.
1	Regression	60327.097	3	20109.032	4250.111	.000(a)
Residual	75.703	16	4.731
Total	60402.800	19
a Predictors: (Constant), FEMHT, HEIGHT, FEMALE
b Dependent Variable: WEIGHT

**Coefficients(a)**
	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
Model	B	Std. Error	Beta
1	(Constant)	5.602	8.069		.694	.497
FEMALE	-7.999	11.371	-.073	-.703	.492
HEIGHT	3.190	.111	.421	28.646	.000
FEMHT	-1.094	.168	-.638	-6.520	.000
a Dependent Variable: WEIGHT

The term femht tests the null hypothesis Ho: B_f = B_m. The T value is -6.52 and is significant, indicating that the regression coefficient B_f is significantly different from B_m.

Let’s look at the parameter estimates to get a better understanding of what they mean and how they are interpreted.
First, recall that our dummy variable female is 1 if female and 0 if male; therefore, males are the omitted group. This is needed for proper interpretation of the estimates. Even though we have run a single model, it is often useful to think about what the model means for different types of respondents, in this case, males and females. One way to do this is by looking at the regression equation. Below we explore how the equation changes depending on whether the subject is male or female. The first equation is just the general linear regression equation, y-hat is the predicted weight, b0, b1 etc. represent the regression coefficients, and the names of variables stand in for the values of those variables for each case. I have written the intercept as b0*1, normally we see this written just as b0, because the 1 is unnecessary, but it is always there implicitly, and it will help us understand what is going on later.

y-hat = b0*1 + b1*female + b2*height + b3*femht

For males, female = 0, and femht = 0, so the equation is:

y-hat  = b0*1 + b1*0 + b2*height + b3*0

Notice that the b1 and b3 terms are equal to zero, so they drop out, leaving:

y-hat = b0 + b2*height

What this means is that for males, the intercept (or constant) is equal to the constant, which is 5.602. This is equal to the intercept from the model above, where we analyzed just male respondents. Similarly, the relationship between height and weight is described by the coefficient for height (b3), which is 3.19. That is, we can say that for males a one-unit change in height is associated with a 3.19 (b3) pound increase in expected weight. This is equal to the coefficient for height in the model above where we analyzed just males.

For females, female = 1, and femht = height, so the equation is:

y-hat = b0*1 + b1*1 + b2*height + b3*height

we can combine some of the terms, so the equation is reduced to:

y-hat = (b0+b1)*1 + (b2+b3)*height

What we see, is that for females, the intercept is equal to b0 + b1, in this case, 5.602 – 7.999 = -2.397. Notice that this is the same as the intercept from the model for just females. Similarly, for females the expected change in weight for a one-unit increase in height is b2+b3, in this case 3.190 -1.094 = 2.096. By now you probably expect that this will be the same as the coefficient for height in the model we ran on females, and it is. What all of this should make clear is that b3 is the difference between the coefficient for males and the coefficient for females, so if b3 (the coefficient for the variable femht) is significantly different from zero, we can say that the expected change in weight for a given change in weight is different for males and females.

It is also possible to run such an analysis using glm, using syntax like that below. Note that other statistical packages, such as SAS and Stata, omit the group of the dummy variable that is coded as zero. However, SPSS omits the group coded as one. Therefore, when you compare the output from the different packages, the results seem to be different. To make the SPSS results match those from other packages (or the results from the analysis above), you need to create a new variable that has the opposite coding (i.e., switching the zeros and ones). We do this with the male variable. We do not know of an option in SPSS glm to change which group is the omitted group. We will also need to create a new interaction variable (maleht).

compute male = not female.
compute maleht = male*height.
execute.
glm weight by male with height
 /design = male height male by height
 /print = parameter.
Between-Subjects Factors

N

MALE .00 10

1.00 10

Tests of Between-Subjects Effects
Dependent Variable: WEIGHT
Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 60327.097(a) 3 20109.032 4250.111 .000

Intercept .376 1 .376 .079 .782

MALE 2.342 1 2.342 .495 .492

HEIGHT 4695.831 1 4695.831 992.480 .000

MALE * HEIGHT 201.115 1 201.115 42.506 .000

Error 75.703 16 4.731

Total 733114.000 20

Corrected Total 60402.800 19

a R Squared = .999 (Adjusted R Squared = .999)

Parameter Estimates
Dependent Variable: WEIGHT

B Std. Error t Sig. 95% Confidence Interval

Parameter Lower Bound Upper Bound

Intercept 5.602 8.069 .694 .497 -11.504 22.707

[MALE=.00] -7.999 11.371 -.703 .492 -32.104 16.105

[MALE=1.00] 0(a) . . . . .

HEIGHT 3.190 .111 28.646 .000 2.954 3.426

[MALE=.00] * HEIGHT -1.094 .168 -6.520 .000 -1.450 -.738

[MALE=1.00] * HEIGHT 0(a) . . . . .

a This parameter is set to zero because it is redundant.

**Between-Subjects Factors**
	N
MALE	.00	10
1.00	10

**Tests of Between-Subjects Effects**
Dependent Variable: WEIGHT
Source	Type III Sum of Squares	df	Mean Square	F	Sig.
Corrected Model	60327.097(a)	3	20109.032	4250.111	.000
Intercept	.376	1	.376	.079	.782
MALE	2.342	1	2.342	.495	.492
HEIGHT	4695.831	1	4695.831	992.480	.000
MALE * HEIGHT	201.115	1	201.115	42.506	.000
Error	75.703	16	4.731
Total	733114.000	20
Corrected Total	60402.800	19
a R Squared = .999 (Adjusted R Squared = .999)

**Parameter Estimates**
Dependent Variable: WEIGHT
	B	Std. Error	t	Sig.	95% Confidence Interval
Parameter	Lower Bound	Upper Bound
Intercept	5.602	8.069	.694	.497	-11.504	22.707
[MALE=.00]	-7.999	11.371	-.703	.492	-32.104	16.105
[MALE=1.00]	0(a)	.	.	.	.	.
HEIGHT	3.190	.111	28.646	.000	2.954	3.426
[MALE=.00] * HEIGHT	-1.094	.168	-6.520	.000	-1.450	-.738
[MALE=1.00] * HEIGHT	0(a)	.	.	.	.	.
a This parameter is set to zero because it is redundant.

As you see, the glm output corresponds to the output obtained by regression. The parameter estimates appear at the end of the glm output. They also correspond to the output from regression.