Regression analysis middle of page 269. Make measwt by female interaction and then run regression with the interaction.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/davis, clear generate measwt_f = measwt * female regress reptwt measwt female measwt_f Source | SS df MS Number of obs = 183 ---------+------------------------------ F( 3, 179) = 470.41 Model | 30654.7294 3 10218.2431 Prob > F = 0.0000 Residual | 3888.25423 179 21.7220907 R-squared = 0.8874 ---------+------------------------------ Adj R-squared = 0.8856 Total | 34542.9836 182 189.796613 Root MSE = 4.6607 ------------------------------------------------------------------------------ reptwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- measwt | .9898221 .0425995 23.236 0.000 .9057602 1.073884 female | 39.96412 3.929322 10.171 0.000 32.21037 47.71787 measwt_f | -.7253627 .0559804 -12.957 0.000 -.8358292 -.6148963 _cons | 1.35864 3.277192 0.415 0.679 -5.108262 7.825541 ------------------------------------------------------------------------------
Save version with error (for use later) as davis_er.
save davis_er, replace file davis_er.dta saved
Regression at bottom of 269. Fix error in case 12, and run the regression from above again. Fix error in case 12.
generate t = measwt in 12 (199 missing values generated) replace measwt = measht in 12 (1 real change made) replace measht = t in 12 (1 real change made) drop t
Make measwt by female interaction again.
replace measwt_f = measwt * female
Run regression.
regress reptwt measwt female measwt_f Source | SS df MS Number of obs = 183 ---------+------------------------------ F( 3, 179) = 2228.78 Model | 33642.3446 3 11214.1149 Prob > F = 0.0000 Residual | 900.638968 179 5.03150261 R-squared = 0.9739 ---------+------------------------------ Adj R-squared = 0.9735 Total | 34542.9836 182 189.796613 Root MSE = 2.2431 ------------------------------------------------------------------------------ reptwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- measwt | .9898221 .0205023 48.279 0.000 .9493648 1.030279 female | 1.982518 2.450282 0.809 0.420 -2.852638 6.817673 measwt_f | -.0566831 .0384548 -1.474 0.142 -.1325662 .0191999 _cons | 1.35864 1.577248 0.861 0.390 -1.753752 4.471032 ------------------------------------------------------------------------------
Save corrected version (for use later) as davis_co.
save davis_co, replace file davis_co.dta saved
Page 270, figure 11.2. Show graph similar to figure 11.2 showing outlier. Use the data file with the error.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/davis_er, clear
Run the regression.
regress reptwt measwt female measwt_f Source | SS df MS Number of obs = 183 ---------+------------------------------ F( 3, 179) = 470.41 Model | 30654.7294 3 10218.2431 Prob > F = 0.0000 Residual | 3888.25423 179 21.7220907 R-squared = 0.8874 ---------+------------------------------ Adj R-squared = 0.8856 Total | 34542.9836 182 189.796613 Root MSE = 4.6607 ------------------------------------------------------------------------------ reptwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- measwt | .9898221 .0425995 23.236 0.000 .9057602 1.073884 female | 39.96412 3.929322 10.171 0.000 32.21037 47.71787 measwt_f | -.7253627 .0559804 -12.957 0.000 -.8358292 -.6148963 _cons | 1.35864 3.277192 0.415 0.679 -5.108262 7.825541 ------------------------------------------------------------------------------
predict yhat graph twoway (scatter reptwt measwt, mlabel(female)) (line yhat measwt if female == 1, sort) /// (line yhat measwt if female == 0, sort), xlabel(25(25)175)
Middle of page 270, regression analysis. (Note errata: last term should be reptwt x female). Stata results match the results in Fox.
generate reptwt_f = reptwt * female (17 missing values generated) regress measwt reptwt female reptwt_f Source | SS df MS Number of obs = 183 ---------+------------------------------ F( 3, 179) = 139.07 Model | 29786.3783 3 9928.79278 Prob > F = 0.0000 Residual | 12779.4359 179 71.3934965 R-squared = 0.6998 ---------+------------------------------ Adj R-squared = 0.6947 Total | 42565.8142 182 233.8781 Root MSE = 8.4495 ------------------------------------------------------------------------------ measwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- reptwt | .9689183 .0764096 12.681 0.000 .8181387 1.119698 female | 2.074211 9.297269 0.223 0.824 -16.27214 20.42056 reptwt_f | -.0095251 .1468546 -0.065 0.948 -.2993141 .2802639 _cons | 1.79428 5.923944 0.303 0.762 -9.89547 13.48403 ------------------------------------------------------------------------------
Page 271, make hat value and show largest hat values.
Run the regression.
regress reptwt measwt female measwt_f Source | SS df MS Number of obs = 183 ---------+------------------------------ F( 3, 179) = 470.41 Model | 30654.7294 3 10218.2431 Prob > F = 0.0000 Residual | 3888.25423 179 21.7220907 R-squared = 0.8874 ---------+------------------------------ Adj R-squared = 0.8856 Total | 34542.9836 182 189.796613 Root MSE = 4.6607 ------------------------------------------------------------------------------ reptwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- measwt | .9898221 .0425995 23.236 0.000 .9057602 1.073884 female | 39.96412 3.929322 10.171 0.000 32.21037 47.71787 measwt_f | -.7253627 .0559804 -12.957 0.000 -.8358292 -.6148963 _cons | 1.35864 3.277192 0.415 0.679 -5.108262 7.825541 ------------------------------------------------------------------------------
Generate hat values, calling the result myhat.
predict myhat, hat
Middle of page 271, get the largest hat value, .714.
summarize myhat, detail
Leverage
-------------------------------------------------------------
Percentiles Smallest
1% .0099067 .0099067
5% .0099302 .0099067
10% .0101496 .0099067 Obs 200
25% .0110274 .0099067 Sum of Wgt. 200
50% .0131432 Mean .0212232
Largest Std. Dev. .0514178
75% .0185647 .0687759
90% .0285695 .0732077 Variance .0026438
95% .0456122 .1668405 Skewness 12.45924
99% .1200241 .7141857 Kurtosis 166.9527
Lower part of page 271, subject 12 has the hat value over .7.
list subject myhat if myhat > .7
subject myhat
12. 12 .7141857
Page 274, middle of page. Make studentized residual and show largest value.
Make residual.
predict res, rstud (17 missing values generated)
Get the largest studentized residual, -24.3.
summarize res, detail
Studentized residuals
-------------------------------------------------------------
Percentiles Smallest
1% -2.349376 -24.30446
5% -1.466444 -2.349376
10% -.9816217 -2.189853 Obs 183
25% -.5037653 -1.959426 Sum of Wgt. 183
50% -.0284926 Mean -.0961781
Largest Std. Dev. 2.008318
75% .4462518 2.393195
90% 1.040694 2.90657 Variance 4.033341
95% 1.566641 3.081378 Skewness -9.557869
99% 3.081378 3.496628 Kurtosis 116.8471
Subject 12 had the largest residual.
list subject res if res < -24.3
subject res
12. 12 -24.30446
Middle of page 276, computing DFBETA and making index plot (plot described but not shown). The dfbeta command computes the DFBETA for measwt (called DFmeaswt), for female (called DFfemale, and for measwt_f (called DF1).
dfbeta (17 missing values generated) DFmeaswt: DFbeta(measwt) (17 missing values generated) DFfemale: DFbeta(female) (17 missing values generated) DF1: DFbeta(measwt_f)
Index plot shows an observation influencing female and influencing female*measwt.
rename DFmeaswt_f DF1 graph twoway scatter DFmeaswt DFfemale DF1 subject
Show same plot, but use subject number as symbol to identify which subject has influential data. It is subject 12.
graph twoway scatter DFmeaswt DFfemale DF1 subject, mlabel(subject subject subject)
Scatterplot of dfbetas, suggested near bottom page 276.
graph matrix DFmeaswt DFfemale DF1 subject, mlabel(subject)
Bottom part of page 277, computing and showing Cook’s D, DFFITS, DFBETAS.
Already computed DFBETA above using dfbeta command.
Compute Cook’s D.
predict d, cooksd (17 missing values generated)
Compute DFFITS.
predict dfit, dfits (17 missing values generated)
Use summarize to get largest values of cooks d, dffits, dfbeta like bottom of 277.
summarize d dfit DFmeaswt DFfemale DF1 , detail
Cook's D
-------------------------------------------------------------
Percentiles Smallest
1% 2.11e-06 2.11e-06
5% .0000228 2.11e-06
10% .000026 2.11e-06 Obs 183
25% .0001244 .0000157 Sum of Wgt. 183
50% .0007961 Mean .4738773
Largest Std. Dev. 6.351621
75% .0032174 .0651359
90% .0096058 .0701759 Variance 40.34309
95% .0199879 .0856294 Skewness 13.41655
99% .0856294 85.92735 Kurtosis 181.0043
Dfits
-------------------------------------------------------------
Percentiles Smallest
1% -.4308479 -38.41931
5% -.1958499 -.4308479
10% -.1174221 -.296132 Obs 183
25% -.0592021 -.2823458 Sum of Wgt. 183
50% -.0028959 Mean -.2012365
Largest Std. Dev. 2.843795
75% .0560052 .5108672
90% .1362741 .5115646 Variance 8.087169
95% .2121069 .540725 Skewness -13.37203
99% .540725 .6033236 Kurtosis 180.215
DFmeaswt
-------------------------------------------------------------
Percentiles Smallest
1% -.1318417 -.1449406
5% -.059406 -.1318417
10% -.0427876 -.0978029 Obs 183
25% -.0000766 -.0921938 Sum of Wgt. 183
50% -5.92e-16 Mean .0003997
Largest Std. Dev. .0540191
75% 1.22e-16 .1096412
90% .0141305 .2565254 Variance .0029181
95% .0314289 .2809305 Skewness 5.222216
99% .2809305 .4918421 Kurtosis 45.52623
DFfemale
-------------------------------------------------------------
Percentiles Smallest
1% -.2217265 -.2219922
5% -.1140623 -.2217265
10% -.063526 -.2097966 Obs 183
25% -.0294111 -.1823008 Sum of Wgt. 183
50% -.0050063 Mean .0941992
Largest Std. Dev. 1.482754
75% .0039615 .1956965
90% .0166306 .2036534 Variance 2.198559
95% .0342111 .3870315 Skewness 13.38578
99% .3870315 20.02775 Kurtosis 180.4558
DF1
-------------------------------------------------------------
Percentiles Smallest
1% -.3742779 -24.7525
5% -.0256302 -.3742779
10% -.0133058 -.2137802 Obs 183
25% -.0021874 -.1952085 Sum of Wgt. 183
50% .0054611 Mean -.1163372
Largest Std. Dev. 1.832269
75% .0328629 .2331503
90% .0701569 .2636165 Variance 3.35721
95% .1102957 .2943538 Skewness -13.39209
99% .2943538 .3174025 Kurtosis 180.5693
We can see these values for subject 12.
list d dfit DFmeaswt DFfemale DF1 if subject==12
d dfit DFmeaswt DFfemale DF1
12. 85.92735 -38.41931 8.62e-13 20.02775 -24.7525
Top of page 279, computing COVRATIO. We can compute a variable covrat containing the COVRATIO.
predict covrat, covratio (17 missing values generated)
Look at summary stats for covrat.
summarize covrat Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- covrat | 183 1.018453 .0833018 .0102869 1.19215
You can see that subject 12 has smallest covrat.
list subject covrat if covrat < .02
subject covrat
12. 12 .0102869
Page 283 bottom, and figure 11.5 page 284, partial regression plots.
use https://stats.idre.ucla.edu/stat/stata/examples/ara/duncan, clear
Fit regression at bottom of page 283.
regress prestige income educ
Source | SS df MS Number of obs = 45
---------+------------------------------ F( 2, 42) = 101.22
Model | 36180.9458 2 18090.4729 Prob > F = 0.0000
Residual | 7506.69865 42 178.73092 R-squared = 0.8282
---------+------------------------------ Adj R-squared = 0.8200
Total | 43687.6444 44 992.90101 Root MSE = 13.369
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
income | .5987328 .1196673 5.003 0.000 .3572343 .8402313
educ | .5458339 .0982526 5.555 0.000 .3475521 .7441158
_cons | -6.064663 4.271941 -1.420 0.163 -14.68579 2.556463
------------------------------------------------------------------------------
Figure 11.5 on page 284 can be obtained by using the avplots command.
avplots
Middle of page 284 influence statistics, and figure 11.6 on page 285.
Make and show hat, residual, Cook’s D.
predict hat1, hat
predict res, rstud
predict d, cooksd
summarize hat1 res d, detail
Leverage
-------------------------------------------------------------
Percentiles Smallest
1% .0241298 .0241298
5% .0262859 .0246816
10% .0327049 .0262859 Obs 45
25% .0467892 .031326 Sum of Wgt. 45
50% .05732 Mean .0666667
Largest Std. Dev. .0438265
75% .0705812 .0878518
90% .082588 .1730582 Variance .0019208
95% .1730582 .1945416 Skewness 3.050461
99% .2690896 .2690896 Kurtosis 13.16723
Studentized residuals
-------------------------------------------------------------
Percentiles Smallest
1% -2.397022 -2.397022
5% -1.760491 -1.930919
10% -1.433249 -1.760491 Obs 45
25% -.4980818 -1.704032 Sum of Wgt. 45
50% .0505098 Mean .006828
Largest Std. Dev. 1.055861
75% .5083882 1.602429
90% 1.068858 1.887047 Variance 1.114843
95% 1.887047 2.043805 Skewness .296872
99% 3.134519 3.134519 Kurtosis 3.928454
Cook's D
-------------------------------------------------------------
Percentiles Smallest
1% 1.20e-08 1.20e-08
5% .0000537 .0000128
10% .0001334 .0000537 Obs 45
25% .0016885 .0000784 Sum of Wgt. 45
50% .0058424 Mean .0317011
Largest Std. Dev. .0898235
75% .0236292 .0809681
90% .0585235 .0989846 Variance .0080683
95% .0989846 .2236412 Skewness 5.064094
99% .5663797 .5663797 Kurtosis 29.68371
Figure 11.6on page 285. Scatterplot of rstud and hat weighted by cooksd. Observations with a hat value larger than 0.13 had their occupational title specified by the mlabel(occtitle) option.
graph twoway (scatter res hat1 [w=d],msymbol(Oh)) /// (scatter res hat1 if res <= -2.1 | hat1 >= .13, mlabel(occtitle) msymbol(i)), /// xlabel(0(.05).3) ylabel(-2.5(2.5)5) yline(-2.1 0 2.1) xline(.13 .20)






