Sometimes your research may predict that the size of a
regression coefficient should be bigger for one group than for another. For example, you
might believe that the regression coefficient of **height** predicting
**weight**
would be higher for men than for women. Below, we have a data file with 10 fictional
females and 10 fictional males, along with their **height** in inches and
their **weight** in pounds.

data list free / id * gender (A8) height * weight. begin data. 1 F 56 117 2 F 60 125 3 F 64 133 4 F 68 141 5 F 72 149 6 F 54 109 7 F 62 128 8 F 65 131 9 F 65 131 10 F 70 145 11 M 64 211 12 M 68 223 13 M 72 235 14 M 76 247 15 M 80 259 16 M 62 201 17 M 69 228 18 M 74 245 19 M 75 241 20 M 82 269 end data. execute.

We analyzed their data separately using the **regression** commands below. Note that we have to do two regressions, one
with the data for females only and one with the data for males only. We
use a filter to separate the data into these two groups. The parameter estimates (coefficients) for females and
males are shown below, and the results do seem to suggest that for each
additional inch of **height** there is a larger increase in **
weight** for males (3.18) than for females (2.09).

COMPUTE filter_$=(gender="M"). FILTER BY filter_$.

regression /dep weight /method = enter height.

Variables Entered/Removed(b)Model Variables Entered Variables Removed Method 1 HEIGHT(a) . Enter a All requested variables entered. b Dependent Variable: WEIGH

Model SummaryModel R R Square Adjusted R Square Std. Error of the Estimate 1 .994(a) .988 .987 2.40738 a Predictors: (Constant), HEIGHT

ANOVA(b)Model Sum of Squares df Mean Square F Sig. 1 Regression 3882.536 1 3882.536 669.926 .000(a) Residual 46.364 8 5.795 Total 3928.900 9 a Predictors: (Constant), HEIGHT b Dependent Variable: WEIGHT

Coefficients(a)Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta 1 (Constant) 5.602 8.930 .627 .548 HEIGHT 3.190 .123 .994 25.883 .000 a Dependent Variable: WEIGHT COMPUTE filter_$=(gender="F"). FILTER BY filter_$. regression /dep weight /method = enter height.

Variables Entered/Removed(b)Model Variables Entered Variables Removed Method 1 HEIGHT(a) . Enter a All requested variables entered. b Dependent Variable: WEIGHT

Model SummaryModel R R Square Adjusted R Square Std. Error of the Estimate 1 .989(a) .978 .976 1.91504 a Predictors: (Constant), HEIGHT

ANOVA(b)Model Sum of Squares df Mean Square F Sig. 1 Regression 1319.561 1 1319.561 359.812 .000(a) Residual 29.339 8 3.667 Total 1348.900 9 a Predictors: (Constant), HEIGHT b Dependent Variable: WEIGHT

Coefficients(a)Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta 1 (Constant) -2.397 7.053 -.340 .743 HEIGHT 2.096 .110 .989 18.969 .000 a Dependent Variable: WEIGHT

We can compare the regression coefficients of males with
females to test the null hypothesis H_{0}: **b _{f}** =

**b**, where

_{m}**b**is the regression coefficient for females, and

_{f}**b**is the regression coefficient for males. Another way to write this null hypothesis is H

_{m}_{0}:

**b**–

_{m}**b**= 0 . To do this analysis, we first make a dummy variable called

_{m}**female**that is coded 1 for female and 0 for male, and a variable

**femht**that is the product of

**female**and

**height**(this means that for males,

**femht**is always equal to zero, and for females, it is equal to their height). We then use

**female,**

**height**and

**femht**as predictors in the regression equation. In this sort of analysis male is said to be the omitted category, because we are modeling the effect of being female, however, males still remain in the model.

filter off. execute. compute female = 0. if gender = "F" female = 1. compute femht = female*height. execute. regression /dep weight /method = enter female height femht.

The output is shown below.

Variables Entered/Removed(b)Model Variables Entered Variables Removed Method 1 FEMHT, HEIGHT, FEMALE(a) . Enter a All requested variables entered. b Dependent Variable: WEIGHT

Model SummaryModel R R Square Adjusted R Square Std. Error of the Estimate 1 .999(a) .999 .999 2.17518 a Predictors: (Constant), FEMHT, HEIGHT, FEMALE

ANOVA(b)Model Sum of Squares df Mean Square F Sig. 1 Regression 60327.097 3 20109.032 4250.111 .000(a) Residual 75.703 16 4.731 Total 60402.800 19 a Predictors: (Constant), FEMHT, HEIGHT, FEMALE b Dependent Variable: WEIGHT

Coefficients(a)Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta 1 (Constant) 5.602 8.069 .694 .497 FEMALE -7.999 11.371 -.073 -.703 .492 HEIGHT 3.190 .111 .421 28.646 .000 FEMHT -1.094 .168 -.638 -6.520 .000 a Dependent Variable: WEIGHT

The term **femht** tests the null
hypothesis Ho: **B _{f}** =

**B**. The T value is -6.52 and is significant, indicating that the regression coefficient

_{m}**B**is significantly different from

_{f}**B**.

_{m}Let’s look at the parameter estimates to get a better understanding of what they mean and
how they are interpreted.

First, recall that our dummy variable
**female** is 1 if female and 0 if
male; therefore, males are the omitted group. This is needed for proper interpretation
of the estimates. Even though we have run a single model, it is often useful
to think about what the model means for different types of respondents, in this
case, males and females. One way to do this is by looking at the regression equation.
Below we explore how the equation changes depending on whether the subject is
male or female. The first equation is just the general linear regression
equation, y-hat is the predicted weight, b0, b1 etc. represent the regression
coefficients, and the names of variables stand in for the values of those
variables for each case. I have written the
intercept as b0*1, normally we see this written just as b0, because the 1 is
unnecessary, but it is always there implicitly, and it will help us understand
what is going on later.

y-hat = b0*1 + b1*female + b2*height + b3*femht

For males, female = 0, and femht = 0, so the equation is:

y-hat = b0*1 + b1*0 + b2*height + b3*0

Notice that the b1 and b3 terms are equal to zero, so they drop out, leaving:

y-hat = b0 + b2*height

What this means is that for males, the intercept (or constant) is equal to the constant, which is 5.602. This is equal to the intercept from the model above, where we analyzed just male respondents. Similarly, the relationship between height and weight is described by the coefficient for height (b3), which is 3.19. That is, we can say that for males a one-unit change in height is associated with a 3.19 (b3) pound increase in expected weight. This is equal to the coefficient for height in the model above where we analyzed just males.

For females, female = 1, and femht = height, so the equation is:

y-hat = b0*1 + b1*1 + b2*height + b3*height

we can combine some of the terms, so the equation is reduced to:

y-hat = (b0+b1)*1 + (b2+b3)*height

What we see, is that for females, the intercept is equal to b0 + b1, in this case, 5.602 – 7.999 =
-2.397. Notice that this is the same as the intercept from the model for just
females. Similarly, for females the expected change in weight for a one-unit
increase in height is b2+b3, in this case 3.190 -1.094 = 2.096. By now you
probably expect that this will be the same as the coefficient for height in the
model we ran on females, and it is. What all of this should make clear is that
b3 is the *difference* between the coefficient for males and the
coefficient for females, so if b3 (the coefficient for the variable **femht**)
is significantly different from zero, we can say that the expected change in
weight for a given change in weight is different for males and females.

It is also possible to run such an analysis using** glm**, using syntax like that below. Note
that other statistical packages, such as SAS and Stata, omit the group of the dummy variable
that is coded as zero. However, SPSS omits the group coded as one. Therefore, when you compare
the output from the different packages, the results seem to be different. To make the SPSS results
match those from other packages (or the results from the analysis above), you need to create a new variable that has the opposite coding (i.e.,
switching the zeros and ones). We do this with the **male** variable. We do not know of an option in SPSS
**glm** to change which group is the omitted group. We will also need to
create a new interaction variable (**maleht**).

compute male = not female. compute maleht = male*height. execute.

glm weight by male with height /design = male height male by height /print = parameter.

Between-Subjects FactorsN MALE .00 10 1.00 10

Tests of Between-Subjects Effects

Dependent Variable: WEIGHTSource Type III Sum of Squares df Mean Square F Sig. Corrected Model 60327.097(a) 3 20109.032 4250.111 .000 Intercept .376 1 .376 .079 .782 MALE 2.342 1 2.342 .495 .492 HEIGHT 4695.831 1 4695.831 992.480 .000 MALE * HEIGHT 201.115 1 201.115 42.506 .000 Error 75.703 16 4.731 Total 733114.000 20 Corrected Total 60402.800 19 a R Squared = .999 (Adjusted R Squared = .999)

Parameter Estimates

Dependent Variable: WEIGHTB Std. Error t Sig. 95% Confidence Interval Parameter Lower Bound Upper Bound Intercept 5.602 8.069 .694 .497 -11.504 22.707 [MALE=.00] -7.999 11.371 -.703 .492 -32.104 16.105 [MALE=1.00] 0(a) . . . . . HEIGHT 3.190 .111 28.646 .000 2.954 3.426 [MALE=.00] * HEIGHT -1.094 .168 -6.520 .000 -1.450 -.738 [MALE=1.00] * HEIGHT 0(a) . . . . . a This parameter is set to zero because it is redundant.

As you see, the **glm** output
corresponds to the output obtained by **regression**.
The parameter estimates appear at the end of the **glm** output. They also correspond to the output from
**regression**.