Section 11.1
Regression analysis in the middle of page 269 using the davis data file. First create a dataset with a dummy variable called female and an interaction regressor measwt as the product of measured weight and female. Run regression on them.
data davisIn; /*A dataset with the interaction variable and a subject variable*/ set davis; measwt_f = measwt * (1-male); female=1-male; drop male; run; proc reg data=davisIn; model reptwt=measwt female measwt_f; run; quit; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 30655 10218 470.41 <.0001 Error 179 3888.25423 21.72209 Corrected Total 182 34543 Root MSE 4.66070 R-Square 0.8874 Dependent Mean 65.62295 Adj R-Sq 0.8856 Coeff Var 7.10224 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 3.27719 0.41 0.6789 measwt 1 0.98982 0.04260 23.24 <.0001 female 1 39.96412 3.92932 10.17 <.0001 measwt_f 1 -0.72536 0.05598 -12.96 <.0001
Now we fix the error in case 12 of the dataset. We keep the dataset with the error as davisIn and call our new fixed dataset as davis_co.
data davis_co; set davisIn; if _n_=12 then do temp=measht; measht=measwt; measwt=temp; measwt_f=temp; end; drop temp; run; proc reg data=davis_co ; model reptwt=measwt female measwt_f; run; quit; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 33642 11214 2228.78 <.0001 Error 179 900.63897 5.03150 Corrected Total 182 34543 Root MSE 2.24310 R-Square 0.9739 Dependent Mean 65.62295 Adj R-Sq 0.9735 Coeff Var 3.41817 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 1.57725 0.86 0.3902 measwt 1 0.98982 0.02050 48.28 <.0001 female 1 1.98252 2.45028 0.81 0.4195 measwt_f 1 -0.05668 0.03845 -1.47 0.1422
To get the formula on page 270, we need to create a dataset containing the dummy variable of female and a new interaction regressor reptwt_f. Then we run regression of reported weight on measured weight. We run them on both corrected and uncorrected dataset.
Corrected dataset
data davis_co1; set davis; reptwt_f=reptwt*(1-male); female=1-male; if subject=12 then do temp=measwt; measwt=measht; measht=temp; end; drop male temp; run; proc reg data=davis_co1; model measwt=reptwt female reptwt_f; run; quit; The REG Procedure Model: MODEL1 Dependent Variable: measwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 31722 10574 2082.30 <.0001 Error 179 908.96214 5.07800 Corrected Total 182 32631 Root MSE 2.25344 R-Square 0.9721 Dependent Mean 65.62842 Adj R-Sq 0.9717 Coeff Var 3.43364 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.79428 1.57989 1.14 0.2576 reptwt 1 0.96892 0.02038 47.55 <.0001 female 1 -0.01678 2.47955 -0.01 0.9946 reptwt_f 1 0.00831 0.03917 0.21 0.8323
Now on uncorrected dataset.
data davisIn1; set davis; reptwt_f=reptwt*(1-male); female=1-male; drop male; run; proc reg data=davisIn1; model measwt=reptwt female reptwt_f; run; quit; The REG Procedure Model: MODEL1 Dependent Variable: measwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 29786 9928.79278 139.07 <.0001 Error 179 12779 71.39350 Corrected Total 182 42566 Root MSE 8.44947 R-Square 0.6998 Dependent Mean 66.22404 Adj R-Sq 0.6947 Coeff Var 12.75891 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.79428 5.92394 0.30 0.7623 reptwt 1 0.96892 0.07641 12.68 <.0001 female 1 2.07421 9.29727 0.22 0.8237 reptwt_f 1 -0.00953 0.14685 -0.06 0.9484
To produce Figure 11.2. on page 270, we first create a dataset that contains two new variables freptwt and mreptwt for reported weight on female and male respectively. Then we run proc glm on the dataset and output the predicted values for both female an male group. Then we use SAS proc gplot to render the plot
data davisPr; set davis; if male=1 then mreptwt=reptwt; if male=0 then freptwt=reptwt; output; run; /*dataset created*/ proc glm data=davisPr; model mreptwt freptwt =measwt; output out=dvsOut p=pm pf; run; quit; symbol1 c=black i=none v='M' height=0.5; symbol2 c=black i=join v=none height=1.5; symbol3 c=blue i=none v='F' height=0.5; symbol4 c=blue i=join v=none height=1.5; axis1 label=(r=0 a=90); filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11Fig1.gif'; goptions gsfname=outfiles dev=gif373; proc sort data=dvsOut; by measwt; run; proc gplot data=dvsOut; plot mreptwt*measwt=1 pm*measwt=2 freptwt*measwt=3 pf*measwt=4 /overlay vaxis=axis1; label measwt='Measured Weight, Kg.'; label mreptwt='Reported Weight, Kg.'; run; quit;
Section 11.2
Page 271, make hat diagonal (leverage) and show largest values.
proc reg data=davisIn ; model reptwt=measwt female measwt_f; output out=dvsLev p=pr h=lev;/* p for predicted h for leverage*/; run; quit; proc univariate data=dvsLev; var lev; run; proc print data=dvsLev; where lev ge 0.7; run; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 30655 10218 470.41 <.0001 Error 179 3888.25423 21.72209 Corrected Total 182 34543 Root MSE 4.66070 R-Square 0.8874 Dependent Mean 65.62295 Adj R-Sq 0.8856 Coeff Var 7.10224 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 3.27719 0.41 0.6789 measwt 1 0.98982 0.04260 23.24 <.0001 female 1 39.96412 3.92932 10.17 <.0001 measwt_f 1 -0.72536 0.05598 -12.96 <.0001 The UNIVARIATE Procedure Variable: lev (Leverage) Moments N 200 Sum Weights 200 Mean 0.0212232 Sum Observations 4.2446399 Std Deviation 0.0514178 Variance 0.00264379 Skewness 12.5535875 Kurtosis 168.15714 Uncorrected SS 0.61619913 Corrected SS 0.52611429 Coeff Variation 242.271682 Std Error Mean 0.00363579 Basic Statistical Measures Location Variability Mean 0.021223 Std Deviation 0.05142 Median 0.013143 Variance 0.00264 Mode 0.010224 Range 0.70428 Interquartile Range 0.00754 NOTE: The mode displayed is the smallest of 3 modes with a count of 8. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 5.837304 Pr > |t| <.0001 Sign M 100 Pr >= |M| <.0001 Signed Rank S 10050 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 0.71418565 99% 0.12002413 95% 0.04561218 90% 0.02856951 75% Q3 0.01856466 50% Median 0.01314321 25% Q1 0.01102743 10% 0.01014961 5% 0.00993016 1% 0.00990671 0% Min 0.00990671 The UNIVARIATE Procedure Variable: lev (Leverage) Extreme Observations -------Lowest------- ------Highest------ Value Obs Value Obs 0.00990671 188 0.0645111 30 0.00990671 159 0.0687759 54 0.00990671 28 0.0732077 97 0.00990671 2 0.1668405 21 0.00993016 193 0.7141856 12 Obs subject sex measwt measht reptwt reptht measwt_f female pr lev 12 12 F 166 57 56 163 166 1 85.2230 0.71419
With the error corrected:
proc reg data=davis_co; model measwt=reptwt female measwt_f; output out=dvs_coH p=pm h=lev; run; quit; proc univariate data=dvs_coH; var lev; run; The REG Procedure Model: MODEL1 Dependent Variable: measwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 31777 10592 2220.97 <.0001 Error 179 853.69481 4.76924 Corrected Total 182 32631 Root MSE 2.18386 R-Square 0.9738 Dependent Mean 65.62842 Adj R-Sq 0.9734 Coeff Var 3.32761 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 4.14385 1.50728 2.75 0.0066 reptwt 1 0.93823 0.01943 48.28 <.0001 female 1 -7.27862 2.32718 -3.13 0.0021 measwt_f 1 0.12450 0.03650 3.41 0.0008 The UNIVARIATE Procedure Variable: lev (Leverage) Moments N 183 Sum Weights 183 Mean 0.02185792 Sum Observations 4 Std Deviation 0.0193944 Variance 0.00037614 Skewness 4.83073039 Kurtosis 33.3623499 Uncorrected SS 0.15588965 Corrected SS 0.06845796 Coeff Variation 88.729365 Std Error Mean 0.00143368 Basic Statistical Measures Location Variability Mean 0.021858 Std Deviation 0.01939 Median 0.015604 Variance 0.0003761 Mode 0.015604 Range 0.18047 Interquartile Range 0.00864 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 15.24608 Pr > |t| <.0001 Sign M 91.5 Pr >= |M| <.0001 Signed Rank S 8418 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 0.19040469 99% 0.10076897 95% 0.04995763 90% 0.03947617 75% Q3 0.02102730 50% Median 0.01560388 25% Q1 0.01238808 10% 0.01075212 5% 0.01037395 1% 0.00993415 0% Min 0.00993415 The UNIVARIATE Procedure Variable: lev (Leverage) Extreme Observations -------Lowest------- ------Highest------ Value Obs Value Obs 0.00993415 160 0.0799200 29 0.00993415 90 0.0846260 115 0.00993415 12 0.0855655 54 0.01008300 151 0.1007690 64 0.01009729 108 0.1904047 21 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00
Section 11.3
Page 274, middle of page. Make studentized residual and show largest value.
proc reg data=davisIn; model reptwt = measwt female measwt_f; output out=dvsRs rstudent=rs; /*studentized residuals*/ run; quit; proc univariate data=dvsRs; var rs; run; proc print data=dvsRs; where rs < -24.3 AND rs ne .; run; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 30655 10218 470.41 <.0001 Error 179 3888.25423 21.72209 Corrected Total 182 34543 Root MSE 4.66070 R-Square 0.8874 Dependent Mean 65.62295 Adj R-Sq 0.8856 Coeff Var 7.10224 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 3.27719 0.41 0.6789 measwt 1 0.98982 0.04260 23.24 <.0001 female 1 39.96412 3.92932 10.17 <.0001 measwt_f 1 -0.72536 0.05598 -12.96 <.0001 The UNIVARIATE Procedure Variable: rs (Studentized Residual without Current Obs) Moments N 183 Sum Weights 183 Mean -0.0961781 Sum Observations -17.60059 Std Deviation 2.00831794 Variance 4.03334093 Skewness -9.637042 Kurtosis 117.053571 Uncorrected SS 735.76084 Corrected SS 734.068049 Coeff Variation -2088.1242 Std Error Mean 0.14845913 Basic Statistical Measures Location Variability Mean -0.09618 Std Deviation 2.00832 Median -0.02849 Variance 4.03334 Mode -0.18673 Range 27.80109 Interquartile Range 0.95002 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t -0.64784 Pr > |t| 0.5179 Sign M -2.5 Pr >= |M| 0.7676 Signed Rank S -188 Pr >= |S| 0.7941 Quantiles (Definition 5) Quantile Estimate 100% Max 3.4966276 99% 3.0813780 95% 1.5666415 90% 1.0406940 75% Q3 0.4462518 50% Median -0.0284926 25% Q1 -0.5037653 10% -0.9816218 5% -1.4664439 1% -2.3493765 0% Min -24.3044630 The UNIVARIATE Procedure Variable: rs (Studentized Residual without Current Obs) Extreme Observations ------Lowest------ -----Highest----- Value Obs Value Obs -24.30446 12 1.89365 129 -2.34938 29 2.39320 31 -2.18985 155 2.90657 64 -1.95943 130 3.08138 50 -1.90865 153 3.49663 115 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 Obs subject sex measwt measht reptwt reptht measwt_f female rs 12 12 F 166 57 56 163 166 1 -24.3045
Section 11.4
Middle of page 276, computing DFBETA for measwt, female and measwt_f. In our regression procedure, we use the option influence and use the ODS facilities to output a dataset that contains all the DFBETAS. The index plot shows an observation influencing female and influencing female*measwt.
proc reg data=davisIn; model reptwt=measwt female measwt_f/influence ; ods output OutputStatistics=dvsOut; run; quit; filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11dfbeta.gif'; goptions gsfname=outfiles dev=gif373; symbol1 c=black i=none v=star h=0.5; symbol2 c=blue i=none v=dot h=0.5; symbol3 c=green i=none v=circle h=0.5; proc gplot data=dvsOut; plot (DFB_measwt)*observation=1 (DFB_female)*observation=2 (DFB_measwt_f)*observation=3/overlay; run; quit; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 30655 10218 470.41 <.0001 Error 179 3888.25423 21.72209 Corrected Total 182 34543 Root MSE 4.66070 R-Square 0.8874 Dependent Mean 65.62295 Adj R-Sq 0.8856 Coeff Var 7.10224 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 3.27719 0.41 0.6789 measwt 1 0.98982 0.04260 23.24 <.0001 female 1 39.96412 3.92932 10.17 <.0001 measwt_f 1 -0.72536 0.05598 -12.96 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: reptwt Output Statistics Hat Diag Cov ---------------DFBETAS-------------- Obs Residual RStudent H Ratio DFFITS Intercept measwt female measwt_f 1 -0.5749 -0.1238 0.0123 1.0350 -0.0138 -0.0010 -0.0012 0.0008 0.0009 2 -5.6614 -1.2225 0.0099 0.9989 -0.1223 -0.0000 0.0000 -0.0160 0.0019 3 -1.3391 -0.2883 0.0116 1.0327 -0.0312 -0.0000 0.0000 -0.0099 0.0078 : : : : : : : : : : 196 -3.6055 -0.7776 0.0125 1.0217 -0.0876 -0.0275 0.0141 0.0230 -0.0108 197 -3.5139 -0.7593 0.0163 1.0263 -0.0978 0.0353 -0.0492 -0.0294 0.0374 198 . . 0.0143 . . . . . . 199 0.5574 0.1210 0.0286 1.0525 0.0208 -0.0134 0.0157 0.0112 -0.0120 200 1.4454 0.3114 0.0130 1.0338 0.0357 -0.0031 0.0087 0.0026 -0.0066 Sum of Residuals 0 Sum of Squared Residuals 3888.25423 Predicted Residual SS (PRESS) 13623
The following segment illustrates the facility of SAS INSIGHT for scatterplot matrix from the command line. To some people it may be easier to do it from the SAS pulldown menus (e.g., click on Solutions then Analysis then Interactive Data Analysis).
proc insight data=dvsOut; scatter DFB_measwt DFB_female DFB_measwt_f observation* DFB_measwt DFB_female DFB_measwt_f observation; run; quit;
Bottom part of page 277, computing and showing Cook's D, DFFITS, DFBETAS.
Compute Cook's D and FFITS:
proc reg data=davisIn; model reptwt=measwt female measwt_f; output out=dvsSum cookd=ck; run; quit; proc univariate data=dvsSum; var ck; run; proc univariate data=dvsOut; var dffits DFB_measwt DFB_female DFB_measwt_f; run; proc print data=dvsSum; where subject=12; var ck; run; proc print data=dvsOut; where observation=12; var dffits DFB_measwt DFB_female DFB_measwt_f; quit; The REG Procedure Model: MODEL1 Dependent Variable: reptwt Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 30655 10218 470.41 <.0001 Error 179 3888.25423 21.72209 Corrected Total 182 34543 Root MSE 4.66070 R-Square 0.8874 Dependent Mean 65.62295 Adj R-Sq 0.8856 Coeff Var 7.10224 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.35864 3.27719 0.41 0.6789 measwt 1 0.98982 0.04260 23.24 <.0001 female 1 39.96412 3.92932 10.17 <.0001 measwt_f 1 -0.72536 0.05598 -12.96 <.0001 The UNIVARIATE Procedure Variable: ck (Cook's D Influence Statistic) Moments N 183 Sum Weights 183 Mean 0.47387729 Sum Observations 86.7195445 Std Deviation 6.35162098 Variance 40.343089 Skewness 13.5276811 Kurtosis 182.998763 Uncorrected SS 7383.53663 Corrected SS 7342.44221 Coeff Variation 1340.35141 Std Error Mean 0.46952533 Basic Statistical Measures Location Variability Mean 0.473877 Std Deviation 6.35162 Median 0.000796 Variance 40.34309 Mode 0.000094 Range 85.92734 Interquartile Range 0.00309 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 1.009269 Pr > |t| 0.3142 Sign M 91.5 Pr >= |M| <.0001 Signed Rank S 8418 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 8.59273E+01 99% 8.56294E-02 95% 1.99879E-02 90% 9.60576E-03 75% Q3 3.21741E-03 50% Median 7.96056E-04 25% Q1 1.24424E-04 10% 2.60263E-05 5% 2.28352E-05 1% 2.10827E-06 0% Min 2.10827E-06 The UNIVARIATE Procedure Variable: ck (Cook's D Influence Statistic) Extreme Observations --------Lowest------- -------Highest------ Value Obs Value Obs 2.10827E-06 186 0.0624604 50 2.10827E-06 92 0.0651360 21 2.10827E-06 85 0.0701759 64 1.57302E-05 143 0.0856294 115 1.85113E-05 160 85.9273459 12 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 The UNIVARIATE Procedure Variable: DFFITS Moments N 183 Sum Weights 183 Mean -0.2012365 Sum Observations -36.826272 Std Deviation 2.84379473 Variance 8.08716846 Skewness -13.482793 Kurtosis 182.187495 Uncorrected SS 1479.27545 Corrected SS 1471.86466 Coeff Variation -1413.1608 Std Error Mean 0.21021936 Basic Statistical Measures Location Variability Mean -0.20124 Std Deviation 2.84379 Median -0.00290 Variance 8.08717 Mode -0.01930 Range 39.02263 Interquartile Range 0.11521 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t -0.95727 Pr > |t| 0.3397 Sign M -2.5 Pr >= |M| 0.7676 Signed Rank S -149 Pr >= |S| 0.8362 Quantiles (Definition 5) Quantile Estimate 100% Max 0.60332360 99% 0.54072498 95% 0.21210692 90% 0.13627407 75% Q3 0.05600517 50% Median -0.00289586 25% Q1 -0.05920213 10% -0.11742211 5% -0.19584990 1% -0.43084786 0% Min -38.41931120 UNIVARIATE Procedure Variable: DFFITS Extreme Observations -------Lowest------ ------Highest----- Value Obs Value Obs -38.419311 12 0.351835 17 -0.430848 29 0.510867 21 -0.296132 130 0.511565 50 -0.282346 155 0.540725 64 -0.260521 128 0.603324 115 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 The UNIVARIATE Procedure Variable: DFB_measwt (measwt DFBETAS) Moments N 183 Sum Weights 183 Mean 0.00039965 Sum Observations 0.07313633 Std Deviation 0.05401908 Variance 0.00291806 Skewness 5.26547487 Kurtosis 43.7449484 Uncorrected SS 0.53111634 Corrected SS 0.53108711 Coeff Variation 13516.5265 Std Error Mean 0.00399321 Basic Statistical Measures Location Variability Mean 0.000400 Std Deviation 0.05402 Median 0.000000 Variance 0.00292 Mode 0.000000 Range 0.63678 Interquartile Range 0.0000766 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 0.100083 Pr > |t| 0.9204 Sign M 19.5 Pr >= |M| 0.0048 Signed Rank S 715 Pr >= |S| 0.3204 Quantiles (Definition 5) Quantile Estimate 100% Max 4.91842E-01 99% 2.80931E-01 95% 3.14289E-02 90% 1.41305E-02 75% Q3 6.86667E-16 50% Median 3.14532E-17 25% Q1 -7.65658E-05 10% -4.27876E-02 5% -5.94060E-02 1% -1.31842E-01 0% Min -1.44941E-01 The UNIVARIATE Procedure Variable: DFB_measwt (measwt DFBETAS) Extreme Observations -------Lowest------ ------Highest------ Value Obs Value Obs -0.1449406 156 0.0881442 111 -0.1318417 97 0.1096412 191 -0.0978029 118 0.2565254 54 -0.0921938 87 0.2809306 17 -0.0904702 192 0.4918421 21 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 The UNIVARIATE Procedure Variable: DFB_female (female DFBETAS) Moments N 183 Sum Weights 183 Mean 0.09419919 Sum Observations 17.2384525 Std Deviation 1.48275408 Variance 2.19855965 Skewness 13.4966624 Kurtosis 182.435032 Uncorrected SS 401.761705 Corrected SS 400.137857 Coeff Variation 1574.06238 Std Error Mean 0.10960834 Basic Statistical Measures Location Variability Mean 0.09420 Std Deviation 1.48275 Median -0.00501 Variance 2.19856 Mode -0.00481 Range 20.24974 Interquartile Range 0.03337 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 0.859416 Pr > |t| 0.3912 Sign M -32.5 Pr >= |M| <.0001 Signed Rank S -3776 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 20.02775272 99% 0.38703149 95% 0.03421111 90% 0.01663061 75% Q3 0.00396153 50% Median -0.00500628 25% Q1 -0.02941105 10% -0.06352601 5% -0.11406227 1% -0.22172649 0% Min -0.22199219 The UNIVARIATE Procedure Variable: DFB_female (female DFBETAS) Extreme Observations ------Lowest------ -------Highest------ Value Obs Value Obs -0.221992 115 0.0758795 191 -0.221726 29 0.1956965 54 -0.209797 64 0.2036534 17 -0.182301 50 0.3870315 21 -0.142343 130 20.0277527 12 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 The UNIVARIATE Procedure Variable: DFB_measwt_f (measwt_f DFBETAS) Moments N 183 Sum Weights 183 Mean -0.1163372 Sum Observations -21.28971 Std Deviation 1.83226898 Variance 3.35720963 Skewness -13.503022 Kurtosis 182.551683 Uncorrected SS 613.488939 Corrected SS 611.012153 Coeff Variation -1574.9638 Std Error Mean 0.13544522 Basic Statistical Measures Location Variability Mean -0.11634 Std Deviation 1.83227 Median 0.00546 Variance 3.35721 Mode 0.00314 Range 25.06990 Interquartile Range 0.03505 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t -0.85892 Pr > |t| 0.3915 Sign M 40.5 Pr >= |M| <.0001 Signed Rank S 4418 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 0.31740249 99% 0.29435380 95% 0.11029567 90% 0.07015686 75% Q3 0.03286291 50% Median 0.00546108 25% Q1 -0.00218745 10% -0.01330578 5% -0.02563021 1% -0.37427787 0% Min -24.75250165 The UNIVARIATE Procedure Variable: DFB_measwt_f (measwt_f DFBETAS) Extreme Observations -------Lowest------- ------Highest----- Value Obs Value Obs -24.7525017 12 0.155113 31 -0.3742779 21 0.233150 29 -0.2137802 17 0.263616 50 -0.1952085 54 0.294354 64 -0.0834338 191 0.317402 115 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 Obs ck(Cook's D) 12 85.9273 DFB_ DFB_ DFB_ Obs DFFITS measwt female measwt_f 12 -38.4193 -0.0000 20.0278 -24.7525
Top of page 279, computing COVRATIO. Dataset dvsOut already contains it. We summarize it with proc univariate.
proc univariate data=dvsOut; var covratio; run; proc print data=dvsOut; where covratio<0.02 and covratio ge 0; run; quit; The UNIVARIATE Procedure Variable: CovRatio (Cov Ratio) Moments N 183 Sum Weights 183 Mean 1.01845274 Sum Observations 186.376851 Std Deviation 0.08330185 Variance 0.0069392 Skewness -9.9649448 Kurtosis 119.302488 Uncorrected SS 191.078948 Corrected SS 1.26293396 Coeff Variation 8.17925499 Std Error Mean 0.00615785 Basic Statistical Measures Location Variability Mean 1.018453 Std Deviation 0.08330 Median 1.030849 Variance 0.00694 Mode 1.032772 Range 1.18186 Interquartile Range 0.01719 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 165.391 Pr > |t| <.0001 Sign M 91.5 Pr >= |M| <.0001 Signed Rank S 8418 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 1.1921500 99% 1.0969195 95% 1.0674884 90% 1.0458427 75% Q3 1.0370553 50% Median 1.0308486 25% Q1 1.0198622 10% 0.9867116 5% 0.9600152 1% 0.8073648 0% Min 0.0102869 The UNIVARIATE Procedure Variable: CovRatio (Cov Ratio) Extreme Observations -------Lowest------ -----Highest----- Value Obs Value Obs 0.0102869 12 1.07503 65 0.8073648 115 1.07525 82 0.8536158 50 1.09106 30 0.8789338 64 1.09692 97 0.9190747 31 1.19215 21 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 17 8.50 100.00 Hat Obs Residual RStudent Diagonal 12 -29.2230 -24.3045 0.7142 Obs CovRatio DFFITS 12 0.0103 -38.4193
Section 11.6
Page 283 bottom, and figure 11.5 page 284, partial regression plots using data file duncan. We construct a partial regression plot for intercept based on the second footnote on page 283.
data duncan1; /* to create a constant regressor */ set duncan; Int=1; proc reg data=duncan1 noprint; model prestige Int= income educ / noint; /* the option of no intercept*/ output out=temp r=ry rx; run; filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11pInt.gif'; goptions gsfname=outfiles dev=gif373; proc gplot data=temp; plot ry*rx /hminor=0 vminor=0; label ry='Prestige' rx='Intercept'; run;
Following program produces Figure 11.5.on page 284.
filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11pInc.gif'; goptions gsfname=outfiles dev=gif373; proc reg data=duncan; model prestige income=educ; output out=dnEd r=prst inc; run; proc reg data=dnEd; model prst=inc; plot prst*inc /haxis=(-50 to 75 by 25) vaxis=(-50 to 100 by 50) nomodel nostat; label prst='Prestige'; label inc='Income'; run; quit; filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11ped.gif'; goptions gsfname=outfiles dev=gif373; proc reg data=duncan; model prestige educ=income; output out=dcInc r=prst ed; run; proc reg data=dcInc; model prst=ed; plot prst*ed / haxis=(-75 to 50 by 25) vaxis=(-50 to 100 by 50) nomodel nostat; label prst='Prestige'; label ed='Education'; run; quit;
Figure 11.6. Bubble plot.
filename outfiles 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/chp11bbl.gif'; goptions gsfname=outfiles dev=gif373; proc reg data=duncan; model prestige=income educ; output out=dncnOut cookd=ck h=lev student=rs; run; quit; axis1 order=(0 to 0.3 by 0.05); axis2 order=(-2.5 to 5 by 2.5) label=(r=0 a=90); proc gplot data=dncnOut; bubble rs*lev=ck /haxis=axis1 vaxis=axis2 bsize=10 hminor=0 vminor=0; label rs='Studentized Residuals'; label lev='Hat-Value'; run; quit;