Sometimes your research may predict that the size of a regression coefficient should be bigger for one group than for another. For example, you might believe that the regression coefficient of height predicting weight would be higher for men than for women. Below, we have a data file with 10 fictional females and 10 fictional males, along with their height in inches and their weight in pounds.
DATA htwt; INPUT id Gender $ height weight ; CARDS; 1 F 56 117 2 F 60 125 3 F 64 133 4 F 68 141 5 F 72 149 6 F 54 109 7 F 62 128 8 F 65 131 9 F 65 131 10 F 70 145 11 M 64 211 12 M 68 223 13 M 72 235 14 M 76 247 15 M 80 259 16 M 62 201 17 M 69 228 18 M 74 245 19 M 75 241 20 M 82 269 ; RUN;
We analyzed their data separately using the proc reg below.
PROC REG DATA=htwt; BY gender; MODEL weight = height ; RUN;
The parameter estimates (coefficients) for females and males are shown below, and the results do seem to suggest that height is a stronger predictor of weight for males (3.18) than for females (2.09).
GENDER=F T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -2.397470040 -0.34 0.7427 7.05327189 HEIGHT 2.095872170 18.97 0.0001 0.11049098 GENDER=M T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 5.601677149 0.63 0.5480 8.93019669 HEIGHT 3.189727463 25.88 0.0001 0.12323669
We can compare the regression coefficients of males with females to test the null hypothesis Ho: Bf = Bm, where Bf is the regression coefficient for females, and Bm is the regression coefficient for males. To do this analysis, we first make a dummy variable called female that is coded 1 for female and 0 for male, and a variable femht that is the product of female and height. We then use female height and femht as predictors in the regression equation.
data htwt2; set htwt; female = . ; IF gender = "F" then female = 1; IF gender = "M" then female = 0; femht = female*height ; RUN; PROC REG DATA=htwt2 ; MODEL weight = female height femht ; RUN;
The output is shown below.
Model: MODEL1 Dependent Variable: WEIGHT Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 3 60327.09739 20109.03246 4250.111 0.0001 Error 16 75.70261 4.73141 C Total 19 60402.80000 Root MSE 2.17518 R-square 0.9987 Dep Mean 183.40000 Adj R-sq 0.9985 C.V. 1.18603 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 5.601677 8.06886167 0.694 0.4975 FEMALE 1 -7.999147 11.37054598 -0.703 0.4919 HEIGHT 1 3.189727 0.11135027 28.646 0.0001 FEMHT 1 -1.093855 0.16777741 -6.520 0.0001
The term femht tests the null hypothesis Ho: Bf = Bm. The T value is -6.52 and is significant, indicating that the regression coefficient Bf is significantly different from Bm.
Let’s look at the parameter estimates to get a better understanding of what they mean and
how they are interpreted.
First, recall that our dummy variable
female is 1 if female and 0 if
male; therefore, males are the omitted group. This is needed for proper interpretation
of the estimates.
Parameter Variable Estimate INTERCEP 5.601677 : This is the intercept for the males (omitted group) This corresponds to the intercept for males in the separate groups analysis. FEMALE -7.999147 : Intercept Females - Intercept males This corresponds to differences of the intercepts from the separate groups analysis. and is indeed -2.397470040 - 5.601677149 HEIGHT 3.189727 : Slope for males (omitted group), i.e., Bm. FEMHT -1.093855 : Slope for females - Slope for males (i.e. Bf - Bm). From the separate groups, this is indeed 2.095872170 - 3.189727463 .
It is also possible to run such an analysis in proc glm, using syntax like that below.
PROC GLM DATA=htwt2 ; CLASS gender ; MODEL weight = gender height gender*height / SOLUTION ; RUN;
As you see, the proc glm output corresponds to the output obtained by proc reg.
General Linear Models Procedure Class Level Information Class Levels Values GENDER 2 F M Number of observations in data set = 20 General Linear Models Procedure Dependent Variable: WEIGHT Sum of Mean Source DF Squares Square F Value Pr > F Model 3 60327.097387 20109.032462 4250.11 0.0001 Error 16 75.702613 4.731413 Corrected Total 19 60402.800000 R-Square C.V. Root MSE WEIGHT Mean 0.998747 1.186031 2.1751812 183.40000 Source DF Type I SS Mean Square F Value Pr > F GENDER 1 55125.000000 55125.000000 11650.85 0.0001 HEIGHT 1 5000.982757 5000.982757 1056.97 0.0001 HEIGHT*GENDER 1 201.114630 201.114630 42.51 0.0001 Source DF Type III SS Mean Square F Value Pr > F GENDER 1 2.3416157 2.3416157 0.49 0.4919 HEIGHT 1 4695.8308766 4695.8308766 992.48 0.0001 HEIGHT*GENDER 1 201.1146303 201.1146303 42.51 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 5.601677149 B 0.69 0.4975 8.06886167 GENDER F -7.999147189 B -0.70 0.4919 11.37054598 M 0.000000000 B . . . HEIGHT 3.189727463 B 28.65 0.0001 0.11135027 HEIGHT*GENDER F -1.093855293 B -6.52 0.0001 0.16777741 M 0.000000000 B . . . NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.
The parameter estimates appear at the end of the proc glm output. They correspond to the output from proc reg and from the separate analyses, that is:
INTERCEPT 5.601677149 : This is the intercept for the males GENDER F -7.999147189 : Intercept Females - Intercept males HEIGHT 3.189727463 : Slope for males HEIGHT*GENDER F -1.093855293 : Slope for females - Slope for males