How can I compare regression coefficients between two groups?

Sometimes your research may predict that the size of a regression coefficient should be bigger for one group than for another. For example, you might believe that the regression coefficient of height predicting weight would be higher for men than for women. Below, we have a data file with 10 fictional females and 10 fictional males, along with their height in inches and their weight in pounds.

DATA htwt;
  INPUT id Gender $ height weight ;
CARDS;
 1   F  56 117   
 2   F  60 125   
 3   F  64 133   
 4   F  68 141   
 5   F  72 149   
 6   F  54 109   
 7   F  62 128   
 8   F  65 131   
 9   F  65 131   
10   F  70 145   
11   M  64 211   
12   M  68 223   
13   M  72 235   
14   M  76 247   
15   M  80 259   
16   M  62 201   
17   M  69 228   
18   M  74 245   
19   M  75 241   
20   M  82 269   
;
RUN;

We analyzed their data separately using the proc reg below.

PROC REG DATA=htwt;
   BY gender;
   MODEL weight = height ;
RUN;

The parameter estimates (coefficients) for females and males are shown below, and the results do seem to suggest that height is a stronger predictor of weight for males (3.18) than for females (2.09).

GENDER=F
                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate
INTERCEPT              -2.397470040          -0.34     0.7427     7.05327189
HEIGHT                  2.095872170          18.97     0.0001     0.11049098
 
GENDER=M
                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate
INTERCEPT               5.601677149           0.63     0.5480     8.93019669
HEIGHT                  3.189727463          25.88     0.0001     0.12323669

We can compare the regression coefficients of males with females to test the null hypothesis Ho: B_f = B_m, where B_f is the regression coefficient for females, and B_m is the regression coefficient for males. To do this analysis, we first make a dummy variable called female that is coded 1 for female and 0 for male, and a variable femht that is the product of female and height. We then use female height and femht as predictors in the regression equation.

data htwt2;
  set htwt; 
 
  female = . ;
  IF gender = "F" then female = 1;
  IF gender = "M" then female = 0;
 
  femht = female*height ;
 
RUN;
 
PROC REG DATA=htwt2 ;
   MODEL weight = female height femht ;
RUN;

The output is shown below.

Model: MODEL1
Dependent Variable: WEIGHT

 Analysis of Variance

                         Sum of         Mean
Source          DF      Squares       Square      F Value       Prob>F

Model            3  60327.09739  20109.03246     4250.111       0.0001
Error           16     75.70261      4.73141
C Total         19  60402.80000

    Root MSE       2.17518     R-square       0.9987
    Dep Mean     183.40000     Adj R-sq       0.9985
    C.V.           1.18603

    Parameter Estimates

                 Parameter      Standard    T for H0:
Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

INTERCEP   1      5.601677    8.06886167         0.694        0.4975
FEMALE     1     -7.999147   11.37054598        -0.703        0.4919
HEIGHT     1      3.189727    0.11135027        28.646        0.0001
FEMHT      1     -1.093855    0.16777741        -6.520        0.0001

The term femht tests the null hypothesis Ho: B_f = B_m. The T value is -6.52 and is significant, indicating that the regression coefficient B_f is significantly different from B_m.

Let’s look at the parameter estimates to get a better understanding of what they mean and how they are interpreted.
First, recall that our dummy variable female is 1 if female and 0 if male; therefore, males are the omitted group. This is needed for proper interpretation of the estimates.

          Parameter 
Variable  Estimate 
INTERCEP  5.601677 : This is the intercept for the males (omitted group) 
                     This corresponds to the intercept for males in 
                     the separate groups analysis. 
FEMALE   -7.999147 : Intercept Females - Intercept males 
                     This corresponds to differences of the 
                     intercepts from the separate groups analysis. 
                     and is indeed -2.397470040 - 5.601677149 
HEIGHT    3.189727 : Slope for males (omitted group), i.e., B_m. 
FEMHT    -1.093855 : Slope for females - Slope for males 
                     (i.e. B_f - B_m). 
                     From the separate groups, this is indeed 
                     2.095872170 - 3.189727463 .

It is also possible to run such an analysis in proc glm, using syntax like that below.

PROC GLM DATA=htwt2 ;
  CLASS gender ;
  MODEL weight = gender height gender*height / SOLUTION ;
RUN;

As you see, the proc glm output corresponds to the output obtained by proc reg.

General Linear Models Procedure
Class Level Information

Class    Levels    Values

GENDER        2    F M

Number of observations in data set = 20

    General Linear Models Procedure

Dependent Variable: WEIGHT
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    3     60327.097387    20109.032462   4250.11     0.0001

Error                   16        75.702613        4.731413

Corrected Total         19     60402.800000
    
                  R-Square             C.V.        Root MSE          WEIGHT Mean

                  0.998747         1.186031       2.1751812            183.40000

Source                  DF        Type I SS     Mean Square   F Value     Pr > F

 GENDER                   1     55125.000000    55125.000000  11650.85     0.0001
HEIGHT                   1      5000.982757     5000.982757   1056.97     0.0001
HEIGHT*GENDER            1       201.114630      201.114630     42.51     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

GENDER                   1        2.3416157       2.3416157      0.49     0.4919
HEIGHT                   1     4695.8308766    4695.8308766    992.48     0.0001
HEIGHT*GENDER            1      201.1146303     201.1146303     42.51     0.0001

                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT               5.601677149 B         0.69     0.4975     8.06886167
GENDER        F        -7.999147189 B        -0.70     0.4919    11.37054598
              M         0.000000000 B          .        .          .
HEIGHT                  3.189727463 B        28.65     0.0001     0.11135027
HEIGHT*GENDER F        -1.093855293 B        -6.52     0.0001     0.16777741
              M         0.000000000 B          .        .          .

NOTE: The X'X matrix has been found to be singular and a generalized inverse
      was used to solve the normal equations.   Estimates followed by the
      letter 'B' are biased, and are not unique estimators of the parameters.

The parameter estimates appear at the end of the proc glm output. They correspond to the output from proc reg and from the separate analyses, that is:

INTERCEPT          5.601677149 : This is the intercept for the males 
GENDER        F   -7.999147189 : Intercept Females - Intercept males 
HEIGHT             3.189727463 : Slope for males 
HEIGHT*GENDER F   -1.093855293 : Slope for females - Slope for males