Regression with Graphics by Lawrence Hamilton Chapter 6: Robust Regression

Figure 6.1, page 184.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

gen lnhc = ln(hc)
label variable lnhc "natural log of hydrocarbon pollution potential"
regress  mort lnhc

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  1,    58) =    1.35
   Model |  5179.87999     1  5179.87999               Prob > F      =  0.2506
Residual |  223117.041    58  3846.84554               R-squared     =  0.0227
---------+------------------------------               Adj R-squared =  0.0058
   Total |  228296.921    59  3869.43934               Root MSE      =  62.023

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   7.968576   6.867098      1.160   0.251      -5.777414    21.71457
   _cons |   918.4252   20.53273     44.730   0.000       877.3245    959.5259
------------------------------------------------------------------------------

graph twoway (scatter mort lnhc) (lfit mort lnhc), xlabel(0(2)6) ylabel(800(100)1100)

Figure 6.2, page 185.

rreg mort lnhc

   Huber iteration 1:  maximum difference in weights = .58511763
   Huber iteration 2:  maximum difference in weights = .12109939
   Huber iteration 3:  maximum difference in weights = .07054585
   Huber iteration 4:  maximum difference in weights = .02080019
Biweight iteration 5:  maximum difference in weights = .20680335
Biweight iteration 6:  maximum difference in weights = .06324705
Biweight iteration 7:  maximum difference in weights = .05913415
Biweight iteration 8:  maximum difference in weights = .02922746
Biweight iteration 9:  maximum difference in weights = .01978239
Biweight iteration 10:  maximum difference in weights = .01178611
Biweight iteration 11:  maximum difference in weights = .0036652

Robust regression estimates                            Number of obs =      60
                                                       F(  1,    58) =    8.81
                                                       Prob > F      =  0.0043

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   19.45727   6.553716      2.969   0.004       6.338583    32.57596
   _cons |     891.75   19.59571     45.507   0.000        852.525    930.9751
------------------------------------------------------------------------------

predict yhat
graph twoway (scatter mort lnhc) (lfit mort lnhc) (line yhat lnhc) ///
	(scatter mort lnhc if lnhc >=4.5, mlabel(smsa)), ///
	legend(off) xlabel(0(2)6) ylabel(800(100)1100)

Table 6.1, page186. The genwt option generates the weight and labels it "rweight." The format rweight %3.2f command formats the variable rweight to display only two digits after the decimal.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

gen lnhc = ln(hc)
rreg mort lnhc, genwt(rweight)

   Huber iteration 1:  maximum difference in weights = .58511763
   Huber iteration 2:  maximum difference in weights = .12109939
   Huber iteration 3:  maximum difference in weights = .07054585
   Huber iteration 4:  maximum difference in weights = .02080019
Biweight iteration 5:  maximum difference in weights = .20680335
Biweight iteration 6:  maximum difference in weights = .06324705
Biweight iteration 7:  maximum difference in weights = .05913415
Biweight iteration 8:  maximum difference in weights = .02922746
Biweight iteration 9:  maximum difference in weights = .01978239
Biweight iteration 10:  maximum difference in weights = .01178611
Biweight iteration 11:  maximum difference in weights = .0036652

Robust regression estimates                            Number of obs =      60
                                                       F(  1,    58) =    8.81
                                                       Prob > F      =  0.0043

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   19.45727   6.553716      2.969   0.004       6.338583    32.57596
   _cons |     891.75   19.59571     45.507   0.000        852.525    930.9751
------------------------------------------------------------------------------

predict yhat
predict e, r
sort rweight
format mort yhat e lnhc %3.1f
format rweight %3.2f
list smsa hc mort yhat e rweight

             smsa        hc  mort  yhat    e       rweight 
  1.      SanJose       105  790.7  982.3  -191.6  0.08  
  2.   NewOrleans        20  1113.0  950.0  163.0  0.23  
  3.   LosAngeles       648  861.8  1017.7  -155.9  0.27  
  4.     SanDiego       144  839.7  988.4  -148.7  0.32  
  5.    Baltimore        43  1071.0  964.9  106.1  0.60  
  6.      Wichita         4  823.8  918.7  -94.9  0.68  
  7.    Lancaster        11  844.1  938.4  -94.3  0.68  
  8.  Minneapolis        20  857.6  950.0  -92.4  0.69  
  9. SanFrancisco       311  911.7  1003.4  -91.7  0.70  
 10.     Richmond        12  1026.0  940.1  85.9  0.73  
 11.     Portland        56  894.0  970.1  -76.1  0.79  
 12.       Denver        17  871.8  946.9  -75.1  0.79  
 13.   Birmingham        30  1030.0  957.9  72.1  0.81  
 14.  Chattanooga        18  1018.0  948.0  70.0  0.82  
 15.       Albany         8  997.9  932.2  65.7  0.84  
 16.      Memphis        15  1006.0  944.4  61.6  0.86  
 17.   Wilmington        14  1004.0  943.1  60.9  0.86  
 18. Philadelphia        29  1015.0  957.3  57.7  0.87  
 19.    Rochester         7  874.3  929.6  -55.3  0.88  
 20.      Buffalo        18  1002.0  948.0  54.0  0.89  
 21.  GrandRapids         5  871.3  923.1  -51.8  0.90  
 22.        Miami         3  861.4  913.1  -51.7  0.90  
 23.      Seattle        20  899.3  950.0  -50.7  0.90  
 24.      Chicago        88  1025.0  978.9  46.1  0.92  
 25.     Hartford         7  887.5  929.6  -42.1  0.93  
 26.   Greensboro         8  971.1  932.2  38.9  0.94  
 27.    Allentown         6  962.4  926.6  35.8  0.95  
 28.      Atlanta        18  982.3  948.0  34.3  0.95  
 29.       Toledo        11  972.5  938.4  34.1  0.95  
 30.    Worcester         7  895.7  929.6  -33.9  0.96  
 31.       Dallas         1  860.1  891.8  -31.7  0.96  
 32.      NewYork        41  994.6  964.0  30.6  0.96  
 33.    Milwaukee        33  929.2  959.8  -30.6  0.96  
 34.        Akron        21  921.9  951.0  -29.1  0.97  
 35.       Canton        12  912.3  940.1  -27.8  0.97  
 36.    Cleveland        31  986.0  958.6  27.4  0.97  
 37.   Bridgeport         6  899.5  926.6  -27.1  0.97  
 38. Indianapolis        13  968.7  941.7  27.0  0.97  
 39.   Louisville        38  989.3  962.5  26.8  0.97  
 40.      Houston         6  952.5  926.6  25.9  0.97  
 41.   Pittsburgh        45  991.3  965.8  25.5  0.97  
 42.         York         8  911.8  932.2  -20.4  0.98  
 43.  Springfield         5  904.2  923.1  -18.9  0.99  
 44.     Syracuse         8  950.7  932.2  18.5  0.99  
 45.       Boston        21  934.7  951.0  -16.3  0.99  
 46.   Cincinnati        26  970.5  955.1  15.4  0.99  
 47.    Nashville        17  961.0  946.9  14.1  0.99  
 48.   Providence         6  938.5  926.6  11.9  0.99  
 49.   Youngstown        14  954.4  943.1  11.3  0.99  
 50.        Utica         5  912.2  923.1  -10.9  1.00  
 51.   KansasCity         7  919.7  929.6  -9.9  1.00  
 52.       Dayton         6  936.2  926.6  9.6  1.00  
 53.      Detroit        52  959.2  968.6  -9.4  1.00  
 54.      Reading        11  946.2  938.4  7.8  1.00  
 55.     Columbus        23  958.8  952.8  6.0  1.00  
 56.   Washington        65  967.8  973.0  -5.2  1.00  
 57.      StLouis        31  953.6  958.6  -5.0  1.00  
 58.     NewHaven         4  923.2  918.7  4.5  1.00  
 59.        Flint        11  941.2  938.4  2.8  1.00  
 60.    FortWorth         1  891.7  891.8  -0.1  1.00

Figure 6.3, page 188.

format rweight %3.2f
graph twoway (scatter mort lnhc, mlabel(rweight) msymbol(i)) ///
	(line yhat lnhc), xlabel(0(2)6) ylabel(800(100)1100)

Figure 6.7, page 197.

NOTE: For the following figure, we are going to produce each graph manually. One can specify the graph option in the rreg command to get a slideshow of the changes in weights in the IRLS estimation.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

Iteration 1

gen lnhc = ln(hc)
regress mort lnhc

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  1,    58) =    1.35
   Model |  5179.87999     1  5179.87999               Prob > F      =  0.2506
Residual |  223117.041    58  3846.84554               R-squared     =  0.0227
---------+------------------------------               Adj R-squared =  0.0058
   Total |  228296.921    59  3869.43934               Root MSE      =  62.023

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   7.968576   6.867098      1.160   0.251      -5.777414    21.71457
   _cons |   918.4252   20.53273     44.730   0.000       877.3245    959.5259
------------------------------------------------------------------------------

predict e, resid
summarize e, detail

                          Residuals
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -164.8106      -164.8106
 5%    -106.9425      -118.3275
10%    -76.94942      -108.2129       Obs                  60
25%    -40.61413       -105.672       Sum of Wgt.          60

50%     6.803905                      Mean           7.55e-08
                        Largest       Std. Dev.      61.49508
75%     40.52672        84.4721
90%     70.31952       87.77363       Variance       3781.645
95%     86.12286       122.6034       Skewness      -.0801513
99%     170.7031       170.7031       Kurtosis       3.362466

The median is the 50th percentile, 6.803905

gen m0=6.803905
gen ee=abs(e-m0)
summarize ee, detail

                             ee
-------------------------------------------------------------
      Percentiles      Smallest
 1%      1.00688        1.00688
 5%     2.174106        1.00688
10%     5.724127       1.863093       Obs                  60
25%     20.17187       2.485118       Sum of Wgt.          60

50%     35.41085                      Mean           47.77389
                        Largest       Std. Dev.      38.82904
75%     65.78219       115.7995
90%     106.3564       125.1314       Variance       1507.694
95%     120.4655       163.8992       Skewness       1.190757
99%     171.6145       171.6145       Kurtosis       4.333584

gen m00=35.41085
gen s0=m00/.6745
gen es0=abs(e)/s0
gen w1=.
(60 missing values generated)

replace w1=1 if es0<=1.345
(48 real changes made)

replace w1=1.345/es0 if es0>1.345
(12 real changes made)

gen z=1
graph twoway scatter w1 z, connect(l) xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 2

gen mortw1=mort*sqrt(w1)
gen lnhcw0=sqrt(w1)
gen lnhcw1=lnhc*sqrt(w1)
regress mortw1 lnhcw1 lnhcw0, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) = 8902.25
   Model |  50024076.4     2  25012038.2               Prob > F      =  0.0000
Residual |  162958.613    58  2809.63127               R-squared     =  0.9968
---------+------------------------------               Adj R-squared =  0.9966
   Total |  50187035.0    60  836450.584               Root MSE      =  53.006

------------------------------------------------------------------------------
  mortw1 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  lnhcw1 |   12.08538   6.277814      1.925   0.059      -.4810322    24.65179
  lnhcw0 |   908.4398   18.31743     49.594   0.000       871.7735    945.1061
------------------------------------------------------------------------------

predict e1, resid
gen e11=e1/lnhcw0
summarize e11, detail

                             e11
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -173.9847      -173.9847
 5%    -113.1364      -128.8018
10%    -78.96231      -124.8792       Obs                  60
25%    -40.35685      -101.3937       Sum of Wgt.          60

50%     4.943437                      Mean          -1.349338
                        Largest       Std. Dev.      61.68531
75%     36.87691       80.45545
90%     65.34878       87.52921       Variance       3805.077
95%     83.99233       117.1047       Skewness      -.2116352
99%     168.3556       168.3556       Kurtosis       3.577332

gen m1=4.943437
gen ee1=abs(e11-m1)
summarize ee1, detail

                             ee1
-------------------------------------------------------------
      Percentiles      Smallest
 1%      1.16269        1.16269
 5%     1.609889       1.162691
10%     3.902509       1.284287       Obs                  60
25%     17.60206       1.935491       Sum of Wgt.          60

50%     33.28572                      Mean           47.06011
                        Largest       Std. Dev.      39.91437
75%     63.93033       129.8226
90%     102.2999       133.7453       Variance       1593.157
95%     131.7839       163.4122       Skewness       1.300015
99%     178.9281       178.9281       Kurtosis       4.602989

gen m11= 33.28572gen s1=m11/.6745
gen es1=abs(e11)/s1
gen w2=.
(60 missing values generated)

replace w2=1 if es1<=1.345
(48 real changes made)

replace w2=1.345/es1 if es1>1.345
(12 real changes made)

graph twoway (scatter w2 w1) (line w1 w1, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 3

gen mortw2=mort*sqrt(w2)
gen lnhcw2=sqrt(w2)
gen lnhcw3=lnhc*sqrt(w2)
regress mortw2 lnhcw2 lnhcw3, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) = 9248.78
   Model |  49588476.9     2  24794238.5               Prob > F      =  0.0000
Residual |  155487.033    58  2680.81092               R-squared     =  0.9969
---------+------------------------------               Adj R-squared =  0.9968
   Total |  49743964.0    60  829066.066               Root MSE      =  51.777

------------------------------------------------------------------------------
  mortw2 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  lnhcw2 |   905.4311   18.09045     50.050   0.000       869.2191    941.6431
  lnhcw3 |    13.4653   6.237472      2.159   0.035       .9796466    25.95096
------------------------------------------------------------------------------

predict e2, resid
gen e22=e2/lnhcw2
summarize e22, detail

                             e22
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -177.3981      -177.3981
 5%     -115.551      -132.6511
10%     -79.9754       -130.804       Obs                  60
25%    -40.03335       -100.298       Sum of Wgt.          60

50%     4.820021                      Mean          -2.139968
                        Largest       Std. Dev.      61.83381
75%     34.83411       78.77076
90%     64.34794       87.10891       Variance       3823.419
95%     82.93983       114.9232       Skewness      -.2607152
99%     167.2305       167.2305       Kurtosis       3.656614

gen m2=4.820021
gen ee2=abs(e22-m2)
summarize ee2, detail

                             ee2
-------------------------------------------------------------
      Percentiles      Smallest
 1%     1.339484       1.339484
 5%     2.356566       1.339485
10%     4.189049       1.822319       Obs                  60
25%     17.43784       2.890813       Sum of Wgt.          60

50%     32.98885                      Mean           46.90808
                        Largest       Std. Dev.      40.43538
75%     62.89881        135.624
90%     101.7788       137.4711       Variance        1635.02
95%     136.5476       162.4104       Skewness       1.342615
99%     182.2181       182.2181       Kurtosis       4.699437

gen m22=32.98885
gen s2=m22/.6745
gen es2=abs(e22)/s2
gen w3=.
(60 missing values generated)

replace w3=1 if es2<=1.345
(47 real changes made)

replace w3=1.345/es2 if es2>1.345
(13 real changes made)

graph twoway (scatter w3 w2) (line w2 w2, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 4

gen mortw3=mort*sqrt(w3)
gen lnhcw4=sqrt(w3)
gen lnhcw5=lnhc*sqrt(w3)
regress mortw3 lnhcw4 lnhcw5, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) = 9332.34
   Model |  49471461.0     2  24735730.5               Prob > F      =  0.0000
Residual |  153731.199    58  2650.53791               R-squared     =  0.9969
---------+------------------------------               Adj R-squared =  0.9968
   Total |  49625192.2    60  827086.537               Root MSE      =  51.483

------------------------------------------------------------------------------
  mortw3 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  lnhcw4 |    904.103   18.09088     49.976   0.000       867.8902    940.3159
  lnhcw5 |   14.08514   6.258612      2.251   0.028       1.557165    26.61311
------------------------------------------------------------------------------

predict e3, resid
gen e33=e3/lnhcw4
summarize e33, detail

                             e33
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -178.9547      -178.9547
 5%    -116.6589      -134.4035
10%    -80.97359      -133.4887       Obs                  60
25%    -39.90723      -99.82915       Sum of Wgt.          60

50%     4.111221                      Mean          -2.518487
                        Largest       Std. Dev.      61.91423
75%     34.29168       77.99062
90%     64.13075       86.89668       Variance       3833.372
95%     82.44365         113.92       Skewness      -.2834424
99%     166.7017       166.7017       Kurtosis       3.693297

gen m3=4.111221
gen ee3=abs(e33-m3)
summarize ee3, detail

                             ee3
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .7889185       .7889185
 5%     2.865521       .7889187
10%     4.604212       2.748589       Obs                  60
25%     16.45549       2.982452       Sum of Wgt.          60

50%     33.48549                      Mean           46.83978
                        Largest       Std. Dev.      40.58225
75%      61.8055       137.5999
90%     100.9146       138.5147       Variance       1646.919
95%     138.0573       162.5905       Skewness       1.361655
99%     183.0659       183.0659       Kurtosis       4.749048

gen m33=33.48549
gen s3=m33/.6745
gen es3=abs(e33)/s3
gen w4=.
(60 missing values generated)

replace w4=1 if es3<=1.345
(46 real changes made)

replace w4=1.345/es3 if es3>1.345
(14 real changes made)

graph twoway (scatter w4 w3) (line w3 w3, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 5 (change to biweight)

gen mortw4=mort*sqrt(w4)
gen lnhcw6=sqrt(w4)
gen lnhcw7=lnhc*sqrt(w4)
regress mortw4 lnhcw6 lnhcw7, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) = 9291.05
   Model |  49565578.2     2  24782789.1               Prob > F      =  0.0000
Residual |  154708.151    58  2667.38191               R-squared     =  0.9969
---------+------------------------------               Adj R-squared =  0.9968
   Total |  49720286.4    60   828671.44               Root MSE      =  51.647

------------------------------------------------------------------------------
  mortw4 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  lnhcw6 |   903.8434   18.15556     49.783   0.000       867.5011    940.1857
  lnhcw7 |   14.21256   6.283636      2.262   0.027       1.634493    26.79062
------------------------------------------------------------------------------

predict e4, resid
gen e44=e4/lnhcw6
summarize e44, detail

                             e44
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -179.2881      -179.2881
 5%    -116.9001      -134.7772
10%    -81.27055       -134.054       Obs                  60
25%    -39.77162      -99.74622       Sum of Wgt.          60

50%      3.95209                      Mean          -2.609724
                        Largest       Std. Dev.      61.93181
75%     34.16674       77.81688
90%     64.08532       86.83968       Variance       3835.549
95%     82.32828       113.7003       Skewness      -.2881632
99%     166.5795       166.5795       Kurtosis       3.700915

gen m4=3.95209
gen ee4=abs(e44-m4)
summarize ee4, detail

                             ee4
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .6757395       .6757395
 5%     2.970146         .67574
10%     4.538526       2.939001       Obs                  60
25%     16.38228        3.00129       Sum of Wgt.          60

50%     33.58758                      Mean           46.82574
                        Largest       Std. Dev.      40.61432
75%     61.58075       138.0061
90%      100.737       138.7292       Variance       1649.523
95%     138.3677       162.6274       Skewness       1.365385
99%     183.2402       183.2402       Kurtosis       4.758903

gen m44=33.58758
gen s4=m44/.6745
gen es4=e44/s4
gen w5=.
(60 missing values generated)

replace w5=(1-(es4/4.685)^2)^2 if (abs(e44)/s4)<=4.685
(60 real changes made)

replace w5=0 if (abs(e44)/s4)>4.685
(0 real changes made)

graph twoway (scatter w5 w4) (line w4 w4, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 6

gen mortw5=mort*sqrt(w5)
gen lnhcw8=sqrt(w5)
gen lnhcw9=lnhc*sqrt(w5)
regress mortw5 lnhcw8 lnhcw9, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =10199.02
   Model |  46795992.5     2  23397996.2               Prob > F      =  0.0000
Residual |  133060.189    58   2294.1412               R-squared     =  0.9972
---------+------------------------------               Adj R-squared =  0.9971
   Total |  46929052.7    60  782150.878               Root MSE      =  47.897

------------------------------------------------------------------------------
  mortw5 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  lnhcw8 |   900.2209   17.37487     51.812   0.000       865.4413    935.0005
  lnhcw9 |   15.63724   6.060739      2.580   0.012       3.505358    27.76913
------------------------------------------------------------------------------

predict e5, resid
gen e55=e5/lnhcw8
summarize e55, detail

                             e55
-------------------------------------------------------------
      Percentiles      Smallest
 1%     -182.296       -182.296
 5%    -118.1669      -139.6547
10%    -83.87064      -138.2351       Obs                  60
25%    -37.53524      -98.09869       Sum of Wgt.          60

50%     2.892957                      Mean          -2.909743
                        Largest       Std. Dev.      62.15267
75%     34.47182       76.59371
90%     64.29753       86.92204       Variance       3862.955
95%     81.75788       111.9643       Skewness      -.3419659
99%     165.9341       165.9341       Kurtosis        3.78783

gen m5=2.892957
gen ee5=abs(e55-m5)
summarize ee5, detail

                             ee5
-------------------------------------------------------------
      Percentiles      Smallest
 1%      .589726        .589726
 5%      2.40177       .5897262
10%     5.645023       1.591628       Obs                  60
25%     16.05141       3.211911       Sum of Wgt.          60

50%     32.65288                      Mean           46.70808
                        Largest       Std. Dev.      40.97041
75%     61.40457       141.1281
90%       98.751       142.5477       Variance       1678.574
95%     141.8379       163.0411       Skewness       1.408325
99%     185.1889       185.1889       Kurtosis       4.871201

gen m55=32.65288
gen s5=m55/.6745
gen es5=e55/s5
gen w6=.
(60 missing values generated)

replace w6=(1-(es5/4.685)^2)^2 if (abs(e55)/s5)<=4.685
(60 real changes made)

replace w6=0 if (abs(e55)/s5)>4.685
(0 real changes made)

graph twoway (scatter w6 w5) (line w5 w5, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 7 (Note that iterations 7 and 8 are not graphed in the book.)

gen mortw6=mort*sqrt(w6)
gen lnhcw10=sqrt(w6)
gen lnhcw11=lnhc*sqrt(w6)
regress mortw6 lnhcw10 lnhcw11, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =10569.43
   Model |  46495826.3     2  23247913.2               Prob > F      =  0.0000
Residual |  127573.451    58  2199.54225               R-squared     =  0.9973
---------+------------------------------               Adj R-squared =  0.9972
   Total |  46623399.8    60  777056.663               Root MSE      =  46.899

------------------------------------------------------------------------------
  mortw6 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 lnhcw10 |   897.4257   17.15824     52.303   0.000       863.0798    931.7717
 lnhcw11 |   16.85949   6.015829      2.803   0.007       4.817501    28.90148
------------------------------------------------------------------------------

predict e6, resid
gen e66=e6/lnhcw10
summarize e66, detail

                             e66
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -185.1891      -185.1891
 5%    -119.2561      -144.7722
10%      -86.414      -141.5143       Obs                  60
25%    -35.92926      -96.99795       Sum of Wgt.          60

50%     2.874535                      Mean          -3.479797
                        Largest       Std. Dev.       62.3774
75%     34.60612       75.23182
90%     64.16693       86.67999       Variance        3890.94
95%     80.95591       110.1623       Skewness      -.3894314
99%     165.0678       165.0678       Kurtosis       3.864709

gen m6=2.874535
gen ee6=abs(e66-m6)
summarize ee6, detail

                             ee6
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .4724467       .4724467
 5%     3.736962       .4724476
10%     5.664235        2.87833       Obs                  60
25%     15.30565       4.595593       Sum of Wgt.          60

50%       31.832                      Mean           46.64581
                        Largest       Std. Dev.      41.46435
75%      61.2924       144.3889
90%     98.25005       147.6468       Variance       1719.292
95%     146.0178       162.1933       Skewness       1.450591
99%     188.0636       188.0636       Kurtosis       4.977333

gen m66=31.832
gen s6=m66/.6745
gen es6=e66/s6
gen w7=.
(60 missing values generated)

replace w7=(1-(es6/4.685)^2)^2 if (abs(e66)/s6)<=4.685
(60 real changes made)

replace w7=0 if (abs(e66)/s6)>4.685
(0 real changes made)

Iteration 8

gen mortw7=mort*sqrt(w7)
gen lnhcw12=sqrt(w7)
gen lnhcw13=lnhc*sqrt(w7)
regress mortw7 lnhcw12 lnhcw13, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =10952.16
   Model |  46220949.9     2  23110475.0               Prob > F      =  0.0000
Residual |  122387.492    58  2110.12917               R-squared     =  0.9974
---------+------------------------------               Adj R-squared =  0.9973
   Total |  46343337.4    60  772388.957               Root MSE      =  45.936

------------------------------------------------------------------------------
  mortw7 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 lnhcw12 |   894.7121   16.95266     52.777   0.000       860.7777    928.6466
 lnhcw13 |   18.07344   5.972631      3.026   0.004       6.117918    30.02896
------------------------------------------------------------------------------

predict e7, resid
gen e77=e7/lnhcw12
summarize e77, detail

                             e77
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -188.1251      -188.1251
 5%    -120.4005      -149.9176
10%    -89.00262      -144.8338       Obs                  60
25%    -34.39677      -95.96723       Sum of Wgt.          60

50%     3.291201                      Mean          -4.108537
                        Largest       Std. Dev.      62.63243
75%     33.61028       73.81649
90%     63.97465       86.37708       Variance       3922.822
95%     80.09678       108.3101       Skewness      -.4375264
99%     164.1446       164.1446       Kurtosis       3.942903

gen m7=3.291201
gen ee7=abs(e77-m7)
summarize ee7, detail

                             ee7
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .1415799       .1415799
 5%     4.492992       .1415806
10%     6.058385       4.127497       Obs                  60
25%      15.0027       4.858488       Sum of Wgt.          60

50%     32.00533                      Mean           46.59337
                        Largest       Std. Dev.       42.0805
75%     60.68345        148.125
90%     98.24997       153.2088       Variance       1770.768
95%     150.6669       160.8534       Skewness       1.492862
99%     191.4163       191.4163       Kurtosis       5.089983

gen m77=32.00533
gen s7=m77/.6745
gen es7=e77/s7
gen w8=.
(60 missing values generated)

replace w8=(1-(es7/4.685)^2)^2 if (abs(e77)/s7)<=4.685
(60 real changes made)

replace w8=0 if (abs(e77)/s7)>4.685
(0 real changes made)

Iteration 9

gen mortw8=mort*sqrt(w8)
gen lnhcw14=sqrt(w8)
gen lnhcw15=lnhc*sqrt(w8)
regress mortw8 lnhcw14 lnhcw15, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =11032.70
   Model |  46288732.5     2  23144366.2               Prob > F      =  0.0000
Residual |  121672.237    58  2097.79719               R-squared     =  0.9974
---------+------------------------------               Adj R-squared =  0.9973
   Total |  46410404.7    60  773506.745               Root MSE      =  45.802

------------------------------------------------------------------------------
  mortw8 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 lnhcw14 |   893.1722   16.95102     52.691   0.000       859.2411    927.1034
 lnhcw15 |   18.78453   5.985631      3.138   0.003       6.802987    30.76607
------------------------------------------------------------------------------

predict e8, resid
gen e88=e8/lnhcw14
summarize e88, detail

                             e88
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -189.8946      -189.8946
 5%    -121.1205      -152.9813
10%    -90.56861      -146.8279       Obs                  60
25%    -33.54876      -95.41312       Sum of Wgt.          60

50%     3.485634                      Mean          -4.526487
                        Largest       Std. Dev.      62.79643
75%     32.97723       72.93785
90%     63.81236       86.14997       Variance       3943.392
95%     79.54391       107.1754       Skewness      -.4660449
99%     163.5544       163.5544       Kurtosis       3.989447

gen m8=3.485634
gen ee8=abs(e88-m8)
summarize ee8, detail

                             ee8
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .5012119       .5012119
 5%     3.871048       .5012128
10%     6.578306       3.243295       Obs                  60
25%     14.83559         4.4988       Sum of Wgt.          60

50%     32.01626                      Mean           46.58111
                        Largest       Std. Dev.      42.45112
75%     60.54584       150.3135
90%     98.25001       156.4669       Variance       1802.098
95%     153.3902       160.0687       Skewness       1.518493
99%     193.3803       193.3803       Kurtosis       5.161077

gen m88=32.01626
gen s8=m88/.6745
gen es8=e88/s8
gen w9=.
(60 missing values generated)

replace w9=(1-(es8/4.685)^2)^2 if (abs(e88)/s8)<=4.685
(60 real changes made)

replace w9=0 if (abs(e88)/s8)>4.685
(0 real changes made)

graph twoway (scatter w9 w8) (line w8 w8, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 10

gen mortw9=mort*sqrt(w9)
gen lnhcw16=sqrt(w9)
gen lnhcw17=lnhc*sqrt(w9)
regress mortw9 lnhcw16 lnhcw17, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =11114.84
   Model |  46295327.8     2  23147663.9               Prob > F      =  0.0000
Residual |  120790.281    58  2082.59105               R-squared     =  0.9974
---------+------------------------------               Adj R-squared =  0.9973
   Total |  46416118.0    60  773601.967               Root MSE      =  45.635

------------------------------------------------------------------------------
  mortw9 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 lnhcw16 |   892.1295   16.92993     52.695   0.000       858.2405    926.0184
 lnhcw17 |   19.26631   5.987742      3.218   0.002       7.280545    31.25208
------------------------------------------------------------------------------

predict e9, resid
gen e99=e9/lnhcw16
summarize e99, detail

                             e99
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -191.0941      -191.0941
 5%    -121.6088      -155.0575
10%    -91.63017      -148.1794       Obs                  60
25%    -32.97474      -95.03825       Sum of Wgt.          60

50%      3.61685                      Mean          -4.810208
                        Largest       Std. Dev.      62.91361
75%     32.54779         72.342
90%     63.70186       85.99552       Variance       3958.123
95%     79.16876       106.4061       Skewness      -.4854825
99%     163.1539       163.1539       Kurtosis       4.021253

gen m9=3.61685
gen ee9=abs(e99-m9)
summarize ee9, detail

                             ee9
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .7448692       .7448692
 5%     3.345251       .7448692
10%     6.870948       2.644254       Obs                  60
25%     14.72238       4.046248       Sum of Wgt.          60

50%     32.31808                      Mean            46.5728
                        Largest       Std. Dev.      42.71478
75%     60.51368       151.7963
90%     98.25004       158.6744       Variance       1824.553
95%     155.2353        159.537       Skewness       1.535109
99%      194.711        194.711       Kurtosis       5.209073

gen m99=32.31808
gen s9=m99/.6745
gen es9=e99/s9
gen w10=.
(60 missing values generated)

replace w10=(1-(es9/4.685)^2)^2 if (abs(e99)/s9)<=4.685
(60 real changes made)

replace w10=0 if (abs(e99)/s9)>4.685
(0 real changes made)

graph twoway (scatter w10 w9) (line w9 w9, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Iteration 11

gen mortw10=mort*sqrt(w10)
gen lnhcw18=sqrt(w10)
gen lnhcw19=lnhc*sqrt(w10)
regress mortw10 lnhcw18 lnhcw19, nocons

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  2,    58) =11069.26
   Model |  46399353.4     2  23199676.7               Prob > F      =  0.0000
Residual |  121560.159    58  2095.86482               R-squared     =  0.9974
---------+------------------------------               Adj R-squared =  0.9973
   Total |  46520913.5    60  775348.559               Root MSE      =  45.781

------------------------------------------------------------------------------
 mortw10 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 lnhcw18 |   891.8404   16.97908     52.526   0.000       857.8531    925.8278
 lnhcw19 |   19.41061   6.006624      3.232   0.002       7.387048    31.43418
------------------------------------------------------------------------------

predict e10, resid
gen e1010=e10/lnhcw18
summarize e1010, detail

                            e1010
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -191.4766      -191.4766
 5%    -121.7784      -155.7027
10%    -91.97139      -148.6076       Obs                  60
25%    -32.82613      -94.94926       Sum of Wgt.          60

50%     3.632838                      Mean          -4.918479
                        Largest       Std. Dev.      62.94965
75%      32.3959       72.14027
90%     63.64549         85.926       Variance       3962.659
95%     79.03314       106.1523       Skewness      -.4913203
99%     163.0105       163.0105       Kurtosis       4.030817

gen m10=3.632838
gen ee10=abs(e1010-m10)
summarize ee10, detail

                            ee10
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .8178842       .8178842
 5%     3.119054       .8178847
10%     6.824313       2.464844       Obs                  60
25%     14.68851       3.773263       Sum of Wgt.          60

50%     32.40845                      Mean           46.57032
                        Largest       Std. Dev.      42.79567
75%     60.50405       152.2404
90%     98.25003       159.3355       Variance        1831.47
95%     155.7879       159.3777       Skewness       1.539966
99%     195.1095       195.1095       Kurtosis         5.2234

gen m1010=32.40845
gen s10=m1010/.6745
gen es10=e1010/s10
gen w11=.
(60 missing values generated)

replace w11=(1-(es10/4.685)^2)^2 if (abs(e1010)/s10)<=4.685
(60 real changes made)

replace w11=0 if (abs(e1010)/s10)>4.685
(0 real changes made)

graph twoway (scatter w11 w10) (line w10 w10, sort), xlabel(0 .5 1) ylabel(0 .5 1)

Figure 6.8, page 197.

rreg mort lnhc

   Huber iteration 1:  maximum difference in weights = .58511763
   Huber iteration 2:  maximum difference in weights = .12109939
   Huber iteration 3:  maximum difference in weights = .07054585
   Huber iteration 4:  maximum difference in weights = .02080019
Biweight iteration 5:  maximum difference in weights = .20680335
Biweight iteration 6:  maximum difference in weights = .06324705
Biweight iteration 7:  maximum difference in weights = .05913415
Biweight iteration 8:  maximum difference in weights = .02922746
Biweight iteration 9:  maximum difference in weights = .01978239
Biweight iteration 10:  maximum difference in weights = .01178611
Biweight iteration 11:  maximum difference in weights = .0036652

Robust regression estimates                            Number of obs =      60
                                                       F(  1,    58) =    8.81
                                                       Prob > F      =  0.0043

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   19.45727   6.553716      2.969   0.004       6.338583    32.57596
   _cons |     891.75   19.59571     45.507   0.000        852.525    930.9751
------------------------------------------------------------------------------

predict e15, r
gen se=e15/48.09
sort se
graph twoway scatter w11 se, xlabel(-5(1)5) ylabel(0(.2)1) xline(0)

Figure 6.9, page 202.

Note: The robust regression line for this graph does not match the book. Further, the next Figure does match when the two outlying points are removed.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

regress mort hc

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  1,    58) =    1.88
   Model |   7181.1855     1   7181.1855               Prob > F      =  0.1752
Residual |  221115.736    58  3812.34027               R-squared     =  0.0315
---------+------------------------------               Adj R-squared =  0.0148
   Total |  228296.921    59  3869.43934               Root MSE      =  61.744

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      hc |  -.1199471   .0873952     -1.372   0.175      -.2948875    .0549934
   _cons |    944.905   8.630252    109.488   0.000       927.6297    962.1803
------------------------------------------------------------------------------

predict h, leverage
rreg mort hc, genwt(tempwt)

   Huber iteration 1:  maximum difference in weights = .47764579
   Huber iteration 2:  maximum difference in weights = .01386864
Biweight iteration 3:  maximum difference in weights = .15809529
Biweight iteration 4:  maximum difference in weights = .00210305

Robust regression estimates                            Number of obs =      60
                                                       F(  1,    58) =    1.63
                                                       Prob > F      =  0.2075

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      hc |  -.1138386    .089299     -1.275   0.207      -.2925899    .0649127
   _cons |   944.1757   8.818254    107.071   0.000        926.524    961.8273
------------------------------------------------------------------------------

predict yhat2
sort hc
. graph twoway (scatter mort hc) (lfit mort hc) (line yhat2 hc)

Figure 6.10, page 202.

drop if h>=.166
(2 observations deleted)

regress mort hc

  Source |       SS       df       MS                  Number of obs =      58
---------+------------------------------               F(  1,    56) =    0.04
   Model |  157.327497     1  157.327497               Prob > F      =  0.8424
Residual |  220947.205    56   3945.4858               R-squared     =  0.0007
---------+------------------------------               Adj R-squared = -0.0171
   Total |  221104.532    57  3879.02688               Root MSE      =  62.813

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      hc |  -.0636877    .318936     -0.200   0.842      -.7025932    .5752178
   _cons |   943.6545   10.95789     86.116   0.000       921.7032    965.6057
------------------------------------------------------------------------------

rreg mort hc

   Huber iteration 1:  maximum difference in weights = .63453732
   Huber iteration 2:  maximum difference in weights = .23102242
   Huber iteration 3:  maximum difference in weights = .04856842
Biweight iteration 4:  maximum difference in weights = .26244177
Biweight iteration 5:  maximum difference in weights = .10250554
Biweight iteration 6:  maximum difference in weights = .01126607
Biweight iteration 7:  maximum difference in weights = .00181083

Robust regression estimates                            Number of obs =      57
                                                       F(  1,    55) =   15.93
                                                       Prob > F      =  0.0002

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      hc |   1.408462   .3529216      3.991   0.000       .7011917    2.115733
   _cons |    918.531   10.21261     89.941   0.000       898.0645    938.9975
------------------------------------------------------------------------------

predict yhat6
graph twoway (scatter mort hc) (lfit mort hc) (line yhat6 hc, sort), ///
	xlabel(0(50)150) ylabel(800(100)1100)

Figure 6.11, page 204.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

gen lnhc = ln(hc)
graph twoway scatter lnhc hc, connect(l) sort xlabel(0(200)600) ylabel(0(2)6)

gen lnpopd = ln(popden)
graph twoway scatter lnpopd popden, connect(l) sort xlabel(0 5000 10000) ylabel(7 8 9)

gen nrrpoor = -(1/(sqrt(poor)))
graph  twoway scatter nrrpoor poor, connect(l) sort xlabel(10(5)30) ylabel(-.35(.05)-.2)

gen srnonw = sqrt(nonw)
graph twoway scatter  srnonw nonw, connect(l) sort xlabel(0(10)40) ylabel(0(2)6)

Figure 6.12, page 205.

regress mort rain jan educ srnonw

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  4,    55) =   26.00
   Model |  149326.539     4  37331.6348               Prob > F      =  0.0000
Residual |  78970.3819    55  1435.82513               R-squared     =  0.6541
---------+------------------------------               Adj R-squared =  0.6289
   Total |  228296.921    59  3869.43934               Root MSE      =  37.892

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    rain |   1.038763   .5972583      1.739   0.088      -.1581692    2.235696
     jan |    -1.9212   .5579225     -3.443   0.001      -3.039302   -.8030985
    educ |  -21.13074   6.844689     -3.087   0.003       -34.8478   -7.413674
  srnonw |   32.40913   4.662617      6.951   0.000       23.06503    41.75322
   _cons |   1094.805   86.29434     12.687   0.000       921.8676    1267.743
------------------------------------------------------------------------------

avplot lnhc, mlabel(smsa) msymbol(i) xlabel(-4(2)2) ylabel(-100(50)50)

Figure 6.13, page 206.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

gen lnhc=log(hc)
gen srnonw=sqrt(nonw)
rreg mort lnhc  rain jan educ srnonw, genwt(rweight)

   Huber iteration 1:  maximum difference in weights = .45915614
   Huber iteration 2:  maximum difference in weights = .04128307
Biweight iteration 3:  maximum difference in weights = .14223798
Biweight iteration 4:  maximum difference in weights = .00447933

Robust regression estimates                            Number of obs =      60
                                                       F(  5,    54) =   28.12
                                                       Prob > F      =  0.0000

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |   17.76648   4.625325      3.841   0.000       8.493262     27.0397
    rain |   2.317299   .6382043      3.631   0.001       1.037776    3.596821
     jan |  -2.110483   .5029979     -4.196   0.000      -3.118933   -1.102033
    educ |  -19.10964   6.190165     -3.087   0.003      -31.52017   -6.699102
  srnonw |   26.21364    4.38846      5.973   0.000       17.41531    35.01197
   _cons |   1001.758   82.48887     12.144   0.000       836.3781    1167.139
------------------------------------------------------------------------------

predict e, r
gen e1=e/48.09
sort e1
graph twoway scatter rweight e1, connect(l) xlabel(-5(1)5) ylabel(0(.2)1) xline(0)

Table 6.3, page 206.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/airpol, clear

gen lnhc = ln(hc)
gen srnonw = sqrt(nonw)
regress mort lnhc rain jan educ srnonw

  Source |       SS       df       MS                  Number of obs =      60
---------+------------------------------               F(  5,    54) =   28.63
   Model |  165769.597     5  33153.9194               Prob > F      =  0.0000
Residual |  62527.3244    54  1157.91341               R-squared     =  0.7261
---------+------------------------------               Adj R-squared =  0.7008
   Total |  228296.921    59  3869.43934               Root MSE      =  34.028

------------------------------------------------------------------------------
    mort |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    lnhc |    17.4691   4.635721      3.768   0.000       8.175039    26.76316
    rain |   2.352107   .6396387      3.677   0.001       1.069709    3.634505
     jan |    -2.1316   .5041284     -4.228   0.000      -3.142316   -1.120883
    educ |  -17.95806   6.204078     -2.895   0.005      -30.39649   -5.519631
  srnonw |    27.3349   4.398323      6.215   0.000        18.5168    36.15301
   _cons |    986.261   82.67427     11.929   0.000        820.509    1152.013
------------------------------------------------------------------------------

predict h, leverage
format h %3.2f
list smsa h if h>.23

             smsa    h 
 16.       Dallas  0.23  
 21.    FortWorth  0.25  
 29.   LosAngeles  0.28  
 32.        Miami  0.36  
 46.     SanDiego  0.25

Table 6.14, page 209.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/bays, clear

list bay pcb84 pcb85

                         bay      pcb84      pcb85 
  1.               Casco Bay      95.28      77.55  
  2.         Merrimack River      52.97      29.23  
  3.            Salem Harbor     533.58      403.1  
  4.           Boston Harbor   17104.86        736  
  5.           Buzzards' Bay     308.46     192.15  
  6.        Narragansett Bay     159.96      220.6  
  7.  East Long Island Sound         10       8.62  
  8.  West Long Island Sound     234.43     174.31  
  9.             Raritan Bay     443.89     529.28  
 10.            Delaware Bay        2.5     130.67  
 11.    Lower Chesapeake Bay         51      39.74  
 12.          Pamilico Sound          0          0  
 13.       Charleston Harbor        9.1       8.43  
 14.            Sapelo Sound          0          0  
 15.         St. Johns River        140     120.04  
 16.               Tampa Bay          0          0  
 17.        Apalachicola Bay         12      11.93  
 18.              Mobile Bay          0          0  
 19.            Round Island          0          0  
 20. Mississippi River Delta         34      30.14  
 21.           Barataria Bay          0          0  
 22.         San Antonio Bay          0          0  
 23.      Corpus Christi Bay          0          0  
 24.        San Diego Harbor      422.1     531.67  
 25.           San Diego Bay       6.74        9.3  
 26.              Dana Point       7.06       5.74  
 27.              Seal Beach      46.71      46.47  
 28.        San Pedro Canyon     159.56      176.9  
 29.        Santa Monica Bay         14      13.69  
 30.              Bodega Bay       4.18       4.89  
 31.                Coos Bay       3.19        6.6  
 32.    Columbia River Mouth       8.77       6.73  
 33.         Nisqually Beach       4.23       4.28  
 34.        Commencement Bay       20.6       20.5  
 35.             Elliott Bay     329.97      414.5  
 36.             Lutak Inlet        5.5        5.8  
 37.               Nahku Bay        6.6       5.08

Figure 6.15, page 210.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/bays, clear

qreg pcb85 pcb84

Iteration  1:  WLS sum of weighted deviations =  3864.9265

Iteration  1: sum of abs. weighted deviations =  16672.456
Iteration  2: sum of abs. weighted deviations =  4312.7581
Iteration  3: sum of abs. weighted deviations =  2969.6801

Median regression                                    Number of obs =        37
  Raw sum of deviations  3821.07 (about 11.93)
  Min sum of deviations  2969.68                     Pseudo R2     =    0.2228

------------------------------------------------------------------------------
   pcb85 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   pcb84 |   .0425018   .0005854     72.608   0.000       .0413134    .0436901
   _cons |   9.013539     9.6191      0.937   0.355      -10.51427    28.54135
------------------------------------------------------------------------------

predict h1
graph twoway (scatter pcb85 pcb84) (lfit pcb85 pcb84) (line h1 pcb84, sort), ///
	xlabel(0(4000)16000) ylabel(0(200)800)

Figure 6.16, page 210.

NOTE: The prediction for id #4 is far above the highest point on the scale and is excluded from this analysis so that the line shown in the text can be produced.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/bays, clear

regress pcb85 pcb84

  Source |       SS       df       MS                  Number of obs =      37
---------+------------------------------               F(  1,    35) =   21.77
   Model |  462349.858     1  462349.858               Prob > F      =  0.0000
Residual |  743254.192    35  21235.8341               R-squared     =  0.3835
---------+------------------------------               Adj R-squared =  0.3659
   Total |  1205604.05    36  33489.0014               Root MSE      =  145.73

------------------------------------------------------------------------------
   pcb85 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   pcb84 |   .0404537   .0086698      4.666   0.000       .0228532    .0580543
   _cons |    85.0138   24.42159      3.481   0.001       35.43533    134.5923
------------------------------------------------------------------------------

predict h, leverage
summarize h, detail

                          Leverage
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0270276       .0270276
 5%     .0270645       .0270645
10%     .0271934       .0270821       Obs                  37
25%     .0277486       .0271934       Sum of Wgt.          37

50%     .0280503                      Mean           .0540541
                        Largest       Std. Dev.      .1594038
75%     .0280756       .0280853
90%     .0280853       .0280853       Variance       .0254096
95%     .0280853       .0280853       Skewness       5.833291
99%     .9974615       .9974615       Kurtosis       35.02746

Note that the 90th percentile is .0280853

gen wh=.
(37 missing values generated)

replace wh=1 if h<=.0280853
(36 real changes made)

replace wh=(.0280853/h)^2 if h>.0280853
(1 real change made)

qreg pcb85 pcb84 [aw=wh]
(sum of wgt is  3.6001e+001)
Iteration  1:  WLS sum of weighted deviations =  13713.003

(sum of wgt is  3.6001e+001)
Iteration  1: sum of abs. weighted deviations =  1122.2417
Iteration  2: sum of abs. weighted deviations =  942.67567
Iteration  3: sum of abs. weighted deviations =  921.36162
Iteration  4: sum of abs. weighted deviations =  921.36163

Median regression                                    Number of obs =        37
  Raw sum of deviations 3183.548 (about 11.93)
  Min sum of deviations 921.3616                     Pseudo R2     =    0.7106

------------------------------------------------------------------------------
   pcb85 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   pcb84 |    .994862   .0009638   1032.218   0.000       .9929053    .9968186
   _cons |  -7.92e-07   .1708606      0.000   1.000      -.3468662    .3468646
------------------------------------------------------------------------------

predict yhat3 if id~=4
(option xb assumed; fitted values)
(1 missing value generated)

graph twoway (scatter pcb85 pcb84) (line yhat3 pcb84, sort) ///
	(scatter pcb85 pcb84 if pcb84 >= 16000, mlabel(bay) mlabposition(9)), /// 
	xlabel(0(4000)16000) ylabel(0(200)800)

Figure 6.17, page 211.

use https://stats.idre.ucla.edu/stat/stata/examples/rwg/bays, clear

gen log84=log(pcb84+1)
gen log85=log(pcb85+1)
regress log85 log84

  Source |       SS       df       MS                  Number of obs =      37
---------+------------------------------               F(  1,    35) =  251.17
   Model |  145.581687     1  145.581687               Prob > F      =  0.0000
Residual |  20.2863138    35  .579608967               R-squared     =  0.8777
---------+------------------------------               Adj R-squared =  0.8742
   Total |  165.868001    36  4.60744448               Root MSE      =  .76132

------------------------------------------------------------------------------
   log85 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   log84 |   .8508259   .0536852     15.848   0.000        .741839    .9598127
   _cons |   .4251097    .202327      2.101   0.043        .014364    .8358553
------------------------------------------------------------------------------

predict h, leverage
summarize h, detail

                          Leverage
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0270889       .0270889
 5%     .0273455       .0273455
10%     .0286045       .0278075       Obs                  37
25%     .0311876       .0286045       Sum of Wgt.          37

50%     .0415393                      Mean           .0540541
                        Largest       Std. Dev.       .038911
75%     .0706273       .0743968
90%     .0743968       .0759508       Variance       .0015141
95%     .0818475       .0818475       Skewness       3.870526
99%     .2560128       .2560128       Kurtosis        20.8041

Note that the 90th percentile is .0743968. The [aw=wh] option tells Stata to use wh as the aweight.

gen wh=.
(37 missing values generated)

replace wh=1 if h<=.0743968
(33 real changes made)

replace wh=(.0743968/h)^2 if h>.0743968
(4 real changes made)

qreg log85 log84 [aw=wh] 
(sum of wgt is  3.5870e+001)
Iteration  1:  WLS sum of weighted deviations =  14.000723

(sum of wgt is  3.5870e+001)
Iteration  1: sum of abs. weighted deviations =  11.648956
Iteration  2: sum of abs. weighted deviations =  9.4317616
Iteration  3: sum of abs. weighted deviations =  9.3947206

Median regression                                    Number of obs =        37
  Raw sum of deviations 63.89834 (about 2.332144)
  Min sum of deviations 9.394721                     Pseudo R2     =    0.8530

------------------------------------------------------------------------------
   log85 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   log84 |   .9922884   .0076806    129.194   0.000       .9766959    1.007881
   _cons |  -1.49e-09   .0211771      0.000   1.000      -.0429919    .0429919
------------------------------------------------------------------------------

predict yhat1
graph twoway (scatter log85 log84) (lfit log85 log84) (line yhat1 log84), ///
	xlabel(0(2)10) ylabel(0(2)10)