Regression with Graphics by Lawrence Hamilton Chapter 3: Basics of Multiple Regression

Table 3.1, page 68.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/concord1, clear
(Hamilton (1983))

regress water81 income water80

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  2,   493) =  391.76
   Model |   671025350     2   335512675               Prob > F      =  0.0000
Residual |   422213359   493  856416.551               R-squared     =  0.6138
---------+------------------------------               Adj R-squared =  0.6122
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  925.43

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  income |   20.54504    3.38341      6.072   0.000       13.89736    27.19272
 water80 |   .5931267   .0250482     23.679   0.000       .5439123    .6423411
   _cons |   203.8217   94.36129      2.160   0.031       18.42181    389.2216
------------------------------------------------------------------------------

The means in table 3.1 are obtained separately using summarize.

summarize water81 income water80

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
 water81 |     496    2298.387   1486.123        100      10100  
  income |     496    23.07661   13.05784          2        100  
 water80 |     496    2732.056     1763.8        200      12700

Figure 3.1, page 70. First regress the partial model [3.10] and output the residuals using predict. We use the label command to label the residual variable.

regress water81 water80
predict ey_x2, resid
label variable ey_x2 "e y|x2"

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  1,   494) =  696.11
   Model |   639446987     1   639446987               Prob > F      =  0.0000
Residual |   453791723   494  918606.727               R-squared     =  0.5849
---------+------------------------------               Adj R-squared =  0.5841
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  958.44

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 water80 |   .6443923   .0244238     26.384   0.000        .596405    .6923796
   _cons |    537.871   79.40114      6.774   0.000       381.8654    693.8766
------------------------------------------------------------------------------

Next regress partial model [3.11] then output and label the residual variable.

regress income water80
predict ex1_x2, resid
label variable ex1_x2 "e x1|x2"

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  1,   494) =   63.31
   Model |   9588.3167     1   9588.3167               Prob > F      =  0.0000
Residual |   74812.772   494  151.442858               R-squared     =  0.1136
---------+------------------------------               Adj R-squared =  0.1118
   Total |  84401.0887   495   170.50725               Root MSE      =  12.306

------------------------------------------------------------------------------
  income |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 water80 |   .0024953   .0003136      7.957   0.000       .0018791    .0031114
   _cons |   16.25937   1.019498     15.948   0.000       14.25628    18.26245
------------------------------------------------------------------------------

Figure 3.1, page 70. For the graph, we’ll run the full regression model and use the avplot command afterwards to get the plot.

quietly regress water81 income water80
avplot income, ylabel(-4000(2000)4000) xlabel(-20(20)60)

Figure 3.2, page 71.

avplot water80, ylabel(-2000(2000)6000) xlabel(-2000(2000)8000)

Table 3.2, page 74.

regress water81 income water80 educat retire peop81 cpeop

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  6,   489) =  171.08
   Model |   740477522     6   123412920               Prob > F      =  0.0000
Residual |   352761188   489  721393.022               R-squared     =  0.6773
---------+------------------------------               Adj R-squared =  0.6734
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  849.35

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  income |   20.96699   3.463719      6.053   0.000       14.16138     27.7726
 water80 |     .49194   .0263478     18.671   0.000        .440171    .5437089
  educat |  -41.86552   13.22031     -3.167   0.002      -67.84114    -15.8899
  retire |   189.1843   95.02142      1.991   0.047       2.483674     375.885
  peop81 |    248.197    28.7248      8.641   0.000       191.7578    304.6363
   cpeop |    96.4536   80.51903      1.198   0.232      -61.75235    254.6596
   _cons |   242.2204   206.8638      1.171   0.242      -164.2312    648.6721
------------------------------------------------------------------------------

summ water81 income water80 educat retire peop81 cpeop

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
 water81 |     496    2298.387   1486.123        100      10100  
  income |     496    23.07661   13.05784          2        100  
 water80 |     496    2732.056     1763.8        200      12700  
  educat |     496    14.00403    3.09055          6         20  
  retire |     496    .2943548   .4562123          0          1  
  peop81 |     496    3.072581   1.657177          1         10  
   cpeop |     496   -.0383065   .4846579         -3          3

Use the beta option to get the standardized regression coefficients.

regress water81 income water80 educat retire peop81 cpeop, beta

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  6,   489) =  171.08
   Model |   740477522     6   123412920               Prob > F      =  0.0000
Residual |   352761188   489  721393.022               R-squared     =  0.6773
---------+------------------------------               Adj R-squared =  0.6734
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  849.35

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
  income |   20.96699   3.463719      6.053   0.000                   .1842267
 water80 |     .49194   .0263478     18.671   0.000                   .5838571
  educat |  -41.86552   13.22031     -3.167   0.002                  -.0870637
  retire |   189.1843   95.02142      1.991   0.047                   .0580761
  peop81 |    248.197    28.7248      8.641   0.000                   .2767647
   cpeop |    96.4536   80.51903      1.198   0.232                   .0314557
   _cons |   242.2204   206.8638      1.171   0.242                          .
------------------------------------------------------------------------------

Use the level option to change the significance to 90%. The default is 95%.

regress water81 income water80 educat retire peop81 cpeop, level(90)

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  6,   489) =  171.08
   Model |   740477522     6   123412920               Prob > F      =  0.0000
Residual |   352761188   489  721393.022               R-squared     =  0.6773
---------+------------------------------               Adj R-squared =  0.6734
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  849.35

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|       [90% Conf. Interval]
---------+--------------------------------------------------------------------
  income |   20.96699   3.463719      6.053   0.000       15.25887    26.67512
 water80 |     .49194   .0263478     18.671   0.000       .4485193    .5353606
  educat |  -41.86552   13.22031     -3.167   0.002      -63.65226   -20.07877
  retire |   189.1843   95.02142      1.991   0.047       32.59134    345.7773
  peop81 |    248.197    28.7248      8.641   0.000       200.8592    295.5348
   cpeop |    96.4536   80.51903      1.198   0.232      -36.23979     229.147
   _cons |   242.2204   206.8638      1.171   0.242      -98.68612     583.127
------------------------------------------------------------------------------

Table 3.3, page 80.

summ water81 water80 peop81 cpeop retire

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
 water81 |     496    2298.387   1486.123        100      10100  
 water80 |     496    2732.056     1763.8        200      12700  
  peop81 |     496    3.072581   1.657177          1         10  
   cpeop |     496   -.0383065   .4846579         -3          3  
  retire |     496    .2943548   .4562123          0          1

regress water81 water80 peop81 cpeop retire

  Source |       SS       df       MS                  Number of obs =     496
---------+------------------------------               F(  4,   491) =  229.91
   Model |   712718346     4   178179587               Prob > F      =  0.0000
Residual |   380520363   491  774990.557               R-squared     =  0.6519
---------+------------------------------               Adj R-squared =  0.6491
   Total |  1.0932e+09   495  2208563.05               Root MSE      =  880.34

------------------------------------------------------------------------------
 water81 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 water80 |    .519741    .026774     19.412   0.000       .4671352    .5723468
  peop81 |   265.2894   29.63234      8.953   0.000       207.0675    323.5112
   cpeop |   134.4626    83.1959      1.616   0.107      -29.00135    297.9265
  retire |   67.27992   94.28846      0.714   0.476      -117.9787    252.5386
   _cons |   48.64897   107.0549      0.454   0.650      -161.6932    258.9912
------------------------------------------------------------------------------

To obtain the F-statistic and its corresponding p-value at the top of page 61, first regress the full model with income and educat (quietly suppresses the output). Then use the test statement to test the simpler model which doesn’t have income and educat.

quietly regress water81 water80 peop81 cpeop retire income educat
test income educat

 ( 1)  income = 0.0
 ( 2)  educat = 0.0

       F(  2,   489) =   19.24
            Prob > F =    0.0000

To obtain the estimates on page 86, first use the wells dataset and generate the natural log of chlor. Then summarize this new ln_chlor variable with and without a condition on deep.

use wells, clear
(Lee, NH well test data)

gen ln_chlor = log(chlor)
(1 missing value generated)

label variable ln_chlor "Natural Log of Chloride Concen"

summarize ln_chlor if deep==0

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
ln_chlor |      10      3.7751   1.734293   2.302585   6.522093  

summ ln_chlor if deep==1

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
ln_chlor |      42    3.069318   1.258424   1.098612   6.633318  

summ ln_chlor

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
ln_chlor |      52    3.205046   1.372147   1.098612   6.633318

Regression model [3.32], page 86.

regress ln_chlor deep

  Source |       SS       df       MS                  Number of obs =      52
---------+------------------------------               F(  1,    50) =    2.19
   Model |  4.02334351     1  4.02334351               Prob > F      =  0.1455
Residual |  91.9988551    50   1.8399771               R-squared     =  0.0419
---------+------------------------------               Adj R-squared =  0.0227
   Total |  96.0221986    51  1.88278821               Root MSE      =  1.3565

------------------------------------------------------------------------------
ln_chlor |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    deep |   -.705782    .477291     -1.479   0.145      -1.664449    .2528851
   _cons |     3.7751   .4289495      8.801   0.000        2.91353    4.636671
------------------------------------------------------------------------------

Figure 3.3, page 87.

graph twoway (scatter ln_chlor deep) (lfit ln_chlor deep), ylabel(1(1)6) xlabel(0 1)

Figure 3.4, page 88. First generate and label the natural log of droad.

gen ln_road = log(droad)
label variable ln_road "Natural Log Distance from Road"

Next regress model [3.33] and output the predicted values to yhat.

regress ln_chlor deep ln_road

  Source |       SS       df       MS                  Number of obs =      52
---------+------------------------------               F(  2,    49) =    1.21
   Model |  4.50187596     2  2.25093798               Prob > F      =  0.3084
Residual |  91.5203226    49  1.86776169               R-squared     =  0.0469
---------+------------------------------               Adj R-squared =  0.0080
   Total |  96.0221986    51  1.88278821               Root MSE      =  1.3667

------------------------------------------------------------------------------
ln_chlor |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    deep |  -.6971194   .4811856     -1.449   0.154      -1.664098    .2698592
 ln_road |  -.0909673   .1797176     -0.506   0.615      -.4521233    .2701886
   _cons |    4.20954   .9609568      4.381   0.000       2.278425    6.140655
------------------------------------------------------------------------------

predict yhat
(option xb assumed; fitted values)

graph twoway (scatter ln_chlor ln_road) (line yhat ln_road if deep ==0) ///
	(line yhat ln_road if deep ==1), ylabel(0(1)7) xlabel(0(2)8)

Generate the intercept dummy variable for model [3.34].

Figure 3.5, page 89. This is graphed in the same manner as figure 3.4, once the interaction term is created and the predicted values for [3.34] are output.

gen deeproad = deep*ln_road
quietly regress ln_chlor ln_road deeproad
predict yhat2
(option xb assumed; fitted values)

graph twoway (scatter ln_chlor ln_road) (line yhat2 ln_road if deep ==0) ///
		(line yhat2 ln_road if deep ==1), ylabel(0(1)7) xlabel(0(2)8)

Table 3.4, page 3.4.

regress ln_chlor deep ln_road deeproad

  Source |       SS       df       MS                  Number of obs =      52
---------+------------------------------               F(  3,    48) =    3.81
   Model |  18.4831272     3   6.1610424               Prob > F      =  0.0157
Residual |  77.5390714    48  1.61539732               R-squared     =  0.1925
---------+------------------------------               Adj R-squared =  0.1420
   Total |  96.0221986    51  1.88278821               Root MSE      =   1.271

------------------------------------------------------------------------------
ln_chlor |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    deep |  -6.717366   2.094713     -3.207   0.002      -10.92907   -2.505663
 ln_road |  -1.109424   .3844204     -2.886   0.006      -1.882354   -.3364954
deeproad |   1.255847   .4268777      2.942   0.005       .3975521    2.114143
   _cons |   9.073459   1.879384      4.828   0.000       5.294704    12.85221
------------------------------------------------------------------------------

summ ln_chlor deep ln_road deeproad

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
ln_chlor |      52    3.205046   1.372147   1.098612   6.633318  
    deep |      53    .8113208   .3949977          0          1  
 ln_road |      53    4.838378    1.06035   2.995732   7.878534  
deeproad |      53    3.937289   2.141892          0   7.878534

Figure 3.6 on page 91 can be graphed using the same steps as figures 3.4 and 3.5.

Figure 3.7, page 91.

graph twoway (scatter ln_chlor ln_road) (lfit ln_chlor ln_road, by(deep)), ///
		xlabel(0(2)8) ylabel(0(2)8)

Regression model [3.38], page 92.

regress ln_chlor ln_road

  Source |       SS       df       MS                  Number of obs =      52
---------+------------------------------               F(  1,    50) =    0.30
   Model |  .581654392     1  .581654392               Prob > F      =  0.5834
Residual |  95.4405442    50  1.90881088               R-squared     =  0.0061
---------+------------------------------               Adj R-squared = -0.0138
   Total |  96.0221986    51  1.88278821               Root MSE      =  1.3816

------------------------------------------------------------------------------
ln_chlor |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 ln_road |  -.1002276   .1815668     -0.552   0.583      -.4649152      .26446
   _cons |   3.691419   .9016771      4.094   0.000       1.880347    5.502491
------------------------------------------------------------------------------

F-statistic and p-value on page 92.

quietly regress ln_chlor deep ln_road deeproad
test deep deeproad

 ( 1)  deep = 0.0
 ( 2)  deeproad = 0.0

       F(  2,    48) =    5.54
            Prob > F =    0.0068

Table 3.5, page 93. First use the radon dataset and drop the observations which were not used in the analysis in the book.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/radon, clear
(Archer (1987) & Cohen (1988))

drop if _n==10 | _n==15 | _n==16 | _n==21
(4 observations deleted)

Next generate the x1 and x2 dummies.

gen x1=reading
gen x2=fringe

Next create area which is a combination of reading, fringe and control. Let area=1 when reading=1. Then recode its missing values if the case is in the fringe or control areas.

gen area=1 if reading==1        /* area=1 if reading */
(20 missing values generated)

recode area .=2 if fringe==1    /* area=2 if fringe  */
(7 changes made)

recode area .=3 if control==1  /* area=3 if control */
(13 changes made)

/* adding values labels to "area" */
label define areaname 1 "Reading Prong" 2 "Fringe" 3 "Control"
label values area areaname

Next create a new variable mnradon which is a recode of radon. Values 0 – 1.5 are recoded to 1 etc. as in the footnote of Table 3.5.

gen mnradon=radon
recode mnradon 0/1.5=1 1.6/2.4=2 2.5/max=3
(26 changes made)

label define lmh 1 "Low" 2 "Mid" 3 "High"
label values mnradon lmh

Next create the x3 and x4 dummies. These are one if the condition in parenthesis are true and zero otherwise.

gen x3=(mnradon==1)
gen x4=(mnradon==2)

View Table 3.5 using list. The pagesize in Stata is not wide enough to view all the variables at once as on page 93.

list county cancer area

          county     cancer           area 
  1.      Orange          6  Reading Prong  
  2.      Putnam       10.5  Reading Prong  
  3.      Sussex        6.7  Reading Prong  
  4.      Warren          6  Reading Prong  
  5.      Morris        6.1  Reading Prong  
  6.   Hunterdon        6.7  Reading Prong  
  7.       Berks        5.2         Fringe  
  8.      Lehigh        5.6         Fringe  
  9. Northampton        5.8         Fringe  
 10.        Pike        4.5         Fringe  
 11.    Dutchess        5.5         Fringe  
 12.    Sullivan        5.4         Fringe  
 13.      Ulster        6.3         Fringe  
 14.    Columbia        6.3        Control  
 15.    Delaware        4.3        Control  
 16.      Greene          4        Control  
 17.      Otsego        5.9        Control  
 18.       Tioga        4.7        Control  
 19.      Carbon        4.8        Control  
 20.     Lebanon        5.8        Control  
 21.  Lackawanna        5.4        Control  
 22.     Luzerne        5.2        Control  
 23.  Schuylkill        3.6        Control  
 24. Susquehanna        4.3        Control  
 25.       Wayne        3.5        Control  
 26.     Wyoming        6.9        Control 

list county x1 x2 mnradon x3 x4

          county         x1         x2    mnradon         x3         x4 
  1.      Orange          1          0        Low          1          0  
  2.      Putnam          1          0        Mid          0          1  
  3.      Sussex          1          0        Mid          0          1  
  4.      Warren          1          0       High          0          0  
  5.      Morris          1          0        Low          1          0  
  6.   Hunterdon          1          0       High          0          0  
  7.       Berks          0          1       High          0          0  
  8.      Lehigh          0          1       High          0          0  
  9. Northampton          0          1       High          0          0  
 10.        Pike          0          1        Low          1          0  
 11.    Dutchess          0          1        Mid          0          1  
 12.    Sullivan          0          1        Low          1          0  
 13.      Ulster          0          1        Low          1          0  
 14.    Columbia          0          0        Mid          0          1  
 15.    Delaware          0          0        Mid          0          1  
 16.      Greene          0          0        Mid          0          1  
 17.      Otsego          0          0        Mid          0          1  
 18.       Tioga          0          0        Mid          0          1  
 19.      Carbon          0          0        Mid          0          1  
 20.     Lebanon          0          0       High          0          0  
 21.  Lackawanna          0          0        Low          1          0  
 22.     Luzerne          0          0        Low          1          0  
 23.  Schuylkill          0          0       High          0          0  
 24. Susquehanna          0          0        Low          1          0  
 25.       Wayne          0          0        Low          1          0  
 26.     Wyoming          0          0        Mid          0          1

The commands for table 3.6 on page 94 and all the other tables following are given by Hamilton in the book or have procedures similar to previous examples. Below are the statements which generate the variables needed.

gen x1x3=x1*x3
gen x1x4=x1*x4
gen x2x3=x2*x3
gen x2x4=x2*x4

gen v1=x1
recode v1 0=-1 if control==1
(13 changes made)

gen v2=x2
recode v2 0=-1 if control==1
(13 changes made)

gen v3=x3
recode v3 0=-1 if mnradon==3
(7 changes made)

gen v4=x4
recode v4 0=-1 if mnradon==3
(7 changes made)

gen v2v4=v2*v4
gen v2v3=v2*v3
gen v1v4=v1*v4
gen v1v3=v1*v3

Save the new data.

save radon2, replace