Sometimes your research may predict that the size of a regression coefficient should be bigger for one group than for another. For example, you might believe that the regression coefficient of height predicting weight would be higher for men than for women. Below, we have a data file with 10 fictional females and 10 fictional males, along with their height in inches and their weight in pounds.
data list free / id * gender (A8) height * weight. begin data. 1 F 56 117 2 F 60 125 3 F 64 133 4 F 68 141 5 F 72 149 6 F 54 109 7 F 62 128 8 F 65 131 9 F 65 131 10 F 70 145 11 M 64 211 12 M 68 223 13 M 72 235 14 M 76 247 15 M 80 259 16 M 62 201 17 M 69 228 18 M 74 245 19 M 75 241 20 M 82 269 end data. execute.
We analyzed their data separately using the regression commands below. Note that we have to do two regressions, one with the data for females only and one with the data for males only. We use a filter to separate the data into these two groups. The parameter estimates (coefficients) for females and males are shown below, and the results do seem to suggest that for each additional inch of height there is a larger increase in weight for males (3.18) than for females (2.09).
COMPUTE filter_$=(gender="M"). FILTER BY filter_$.regression /dep weight /method = enter height.
Variables Entered/Removed(b) Model Variables Entered Variables Removed Method 1 HEIGHT(a) . Enter a All requested variables entered. b Dependent Variable: WEIGH
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .994(a) .988 .987 2.40738 a Predictors: (Constant), HEIGHT
ANOVA(b) Model Sum of Squares df Mean Square F Sig. 1 Regression 3882.536 1 3882.536 669.926 .000(a) Residual 46.364 8 5.795 Total 3928.900 9 a Predictors: (Constant), HEIGHT b Dependent Variable: WEIGHT
Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta
1 (Constant) 5.602 8.930 .627 .548 HEIGHT 3.190 .123 .994 25.883 .000 a Dependent Variable: WEIGHT COMPUTE filter_$=(gender="F"). FILTER BY filter_$. regression /dep weight /method = enter height.
Variables Entered/Removed(b) Model Variables Entered Variables Removed Method 1 HEIGHT(a) . Enter a All requested variables entered. b Dependent Variable: WEIGHT
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .989(a) .978 .976 1.91504 a Predictors: (Constant), HEIGHT
ANOVA(b) Model Sum of Squares df Mean Square F Sig. 1 Regression 1319.561 1 1319.561 359.812 .000(a) Residual 29.339 8 3.667 Total 1348.900 9 a Predictors: (Constant), HEIGHT b Dependent Variable: WEIGHT
Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta
1 (Constant) -2.397 7.053 -.340 .743 HEIGHT 2.096 .110 .989 18.969 .000 a Dependent Variable: WEIGHT
We can compare the regression coefficients of males with
females to test the null hypothesis H0: bf =
bm,
where bf is the regression coefficient for females, and
bm
is the regression coefficient for males. Another way to write this null
hypothesis is H0: bm – bm = 0 . To do this analysis, we first make a dummy
variable called female that is coded 1 for female and 0 for male,
and a variable femht
that is the product of female and height (this means
that for males, femht is always equal to zero, and for females, it is equal to their height). We then use
female, height and femht as predictors in the regression
equation. In this sort of analysis male is said to be the omitted category,
because we are modeling the effect of being female, however, males still remain
in the model.
filter off. execute. compute female = 0. if gender = "F" female = 1. compute femht = female*height. execute. regression /dep weight /method = enter female height femht.
The output is shown below.
Variables Entered/Removed(b) Model Variables Entered Variables Removed Method 1 FEMHT, HEIGHT, FEMALE(a) . Enter a All requested variables entered. b Dependent Variable: WEIGHT
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .999(a) .999 .999 2.17518 a Predictors: (Constant), FEMHT, HEIGHT, FEMALE
ANOVA(b) Model Sum of Squares df Mean Square F Sig. 1 Regression 60327.097 3 20109.032 4250.111 .000(a) Residual 75.703 16 4.731 Total 60402.800 19 a Predictors: (Constant), FEMHT, HEIGHT, FEMALE b Dependent Variable: WEIGHT
Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta
1 (Constant) 5.602 8.069 .694 .497 FEMALE -7.999 11.371 -.073 -.703 .492 HEIGHT 3.190 .111 .421 28.646 .000 FEMHT -1.094 .168 -.638 -6.520 .000 a Dependent Variable: WEIGHT
The term femht tests the null hypothesis Ho: Bf = Bm. The T value is -6.52 and is significant, indicating that the regression coefficient Bf is significantly different from Bm.
Let’s look at the parameter estimates to get a better understanding of what they mean and
how they are interpreted.
First, recall that our dummy variable
female is 1 if female and 0 if
male; therefore, males are the omitted group. This is needed for proper interpretation
of the estimates. Even though we have run a single model, it is often useful
to think about what the model means for different types of respondents, in this
case, males and females. One way to do this is by looking at the regression equation.
Below we explore how the equation changes depending on whether the subject is
male or female. The first equation is just the general linear regression
equation, y-hat is the predicted weight, b0, b1 etc. represent the regression
coefficients, and the names of variables stand in for the values of those
variables for each case. I have written the
intercept as b0*1, normally we see this written just as b0, because the 1 is
unnecessary, but it is always there implicitly, and it will help us understand
what is going on later.
y-hat = b0*1 + b1*female + b2*height + b3*femht
For males, female = 0, and femht = 0, so the equation is:
y-hat = b0*1 + b1*0 + b2*height + b3*0
Notice that the b1 and b3 terms are equal to zero, so they drop out, leaving:
y-hat = b0 + b2*height
What this means is that for males, the intercept (or constant) is equal to the constant, which is 5.602. This is equal to the intercept from the model above, where we analyzed just male respondents. Similarly, the relationship between height and weight is described by the coefficient for height (b3), which is 3.19. That is, we can say that for males a one-unit change in height is associated with a 3.19 (b3) pound increase in expected weight. This is equal to the coefficient for height in the model above where we analyzed just males.
For females, female = 1, and femht = height, so the equation is:
y-hat = b0*1 + b1*1 + b2*height + b3*height
we can combine some of the terms, so the equation is reduced to:
y-hat = (b0+b1)*1 + (b2+b3)*height
What we see, is that for females, the intercept is equal to b0 + b1, in this case, 5.602 – 7.999 = -2.397. Notice that this is the same as the intercept from the model for just females. Similarly, for females the expected change in weight for a one-unit increase in height is b2+b3, in this case 3.190 -1.094 = 2.096. By now you probably expect that this will be the same as the coefficient for height in the model we ran on females, and it is. What all of this should make clear is that b3 is the difference between the coefficient for males and the coefficient for females, so if b3 (the coefficient for the variable femht) is significantly different from zero, we can say that the expected change in weight for a given change in weight is different for males and females.
It is also possible to run such an analysis using glm, using syntax like that below. Note that other statistical packages, such as SAS and Stata, omit the group of the dummy variable that is coded as zero. However, SPSS omits the group coded as one. Therefore, when you compare the output from the different packages, the results seem to be different. To make the SPSS results match those from other packages (or the results from the analysis above), you need to create a new variable that has the opposite coding (i.e., switching the zeros and ones). We do this with the male variable. We do not know of an option in SPSS glm to change which group is the omitted group. We will also need to create a new interaction variable (maleht).
compute male = not female. compute maleht = male*height. execute.glm weight by male with height /design = male height male by height /print = parameter.
Between-Subjects Factors N MALE .00 10 1.00 10
Tests of Between-Subjects Effects
Dependent Variable: WEIGHTSource Type III Sum of Squares df Mean Square F Sig. Corrected Model 60327.097(a) 3 20109.032 4250.111 .000 Intercept .376 1 .376 .079 .782 MALE 2.342 1 2.342 .495 .492 HEIGHT 4695.831 1 4695.831 992.480 .000 MALE * HEIGHT 201.115 1 201.115 42.506 .000 Error 75.703 16 4.731 Total 733114.000 20 Corrected Total 60402.800 19 a R Squared = .999 (Adjusted R Squared = .999)
Parameter Estimates
Dependent Variable: WEIGHTB Std. Error t Sig. 95% Confidence Interval Parameter
Lower Bound Upper Bound Intercept 5.602 8.069 .694 .497 -11.504 22.707 [MALE=.00] -7.999 11.371 -.703 .492 -32.104 16.105 [MALE=1.00] 0(a) . . . . . HEIGHT 3.190 .111 28.646 .000 2.954 3.426 [MALE=.00] * HEIGHT -1.094 .168 -6.520 .000 -1.450 -.738 [MALE=1.00] * HEIGHT 0(a) . . . . . a This parameter is set to zero because it is redundant.
As you see, the glm output corresponds to the output obtained by regression. The parameter estimates appear at the end of the glm output. They also correspond to the output from regression.