First example on income of four married couples, Table 16.1.
data temp; /*creating Table 16.2 */ array values{4} (6 -3 5 3); array y{4} y1-y4; do i =1 to 4; y1=values[i]; do j=1 to 4; y2=values[j]; do k=1 to 4; y3=values[k]; do l=1 to 4; y4=values[l]; output; end; end; end; end; drop i j k l values1-values4; run; data stat; /*creaing mean for each sample*/ set temp; m=mean(of y1-y4); run; proc means data=stat vardef=N; /*Bootstrapping Means, known distribution*/ var m; run;
The MEANS Procedure Analysis Variable : m N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 256 2.7500000 1.7455300 -3.0000000 6.0000000 proc univariate data=stat noprint; histogram m / midpoints=-3.8 to 8 by .4 href=2.75 lhref=1 haxis=axis1 cfill=green; label m='Boostrap Mean'; run;
Second example of 10 married couples based on Table 16.3. First we run proc means to show the sample mean, standard deviation and normal-theory 95% confidence interval. Then we run the SAS macro boot to get percentile interval and improved boostrap interval for the mean. Macro boot is a part of program jackboot. In order to run boot, we need first create a macro called analyze in which we specify which data set we are going use and which statistic we want to analyze. After running boot, we can further run bootci to get estimate on confidence intervals using different methods. By default, macro boot also produces histogram on bootstrapping replicates of the mean.
data couples; input husinc wifinc @ diff; diff=husinc-wifinc; cards; 24 18 14 17 40 35 44 41 24 18 19 9 21 10 22 30 30 23 24 15 ; run; proc means data=couples mean stddev clm alpha=0.05; var diff; run;
The MEANS Procedure Analysis Variable : diff Lower 95% Upper 95% Mean Std Dev CL for Mean CL for Mean ----------------------------------------------------------- 4.6000000 5.9479221 0.3451128 8.8548872 ----------------------------------------------------------- %include 'jackboot.sas'; %macro analyze(data=, out=); proc means noprint data=&data; output out=&out (drop=_freq_ _type_) mean=mean_diff; var diff; %bystmt; run; %mend; title2 'Normal Confidence Interval'; %boot(data=couples, samples=2000); /*normal-theory C-I given here*/ title2 'Percentile Confidence Interval'; %bootci(PCTL);/*Percentile Intervals*/ title2 'Improved Bootstrap Confidence Interval'; %bootci(BCa); /*improved Bootstrap Intervals*/
Normal Confidence Interval
Frequency | ** | ** ** | ** ** 240 + ** ** ** | ** ** ** | ** ** ** ** | ** ** ** ** 210 + ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** 180 + ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** ** 150 + ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** 120 + ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** 90 + ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** 60 + ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** 30 + ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ----------------------------------------------------------------------------------- - - - - 2 1 1 0 0 0 1 1 2 3 3 4 4 5 6 6 7 7 8 9 . . . . . . . . . . . . . . . . . . . . 4 8 2 6 0 6 2 8 4 0 6 2 8 4 0 6 2 8 4 0 mean_diff Midpoint
Normal Confidence Interval
Approximate Approximate Lower Observed Bootstrap Approximate Standard Confidence Bias-Corrected Name Statistic Mean Bias Error Limit Statistic mean_diff 4.6 4.6 6.2172E-15 1.79316 1.08548 4.6 Approximate Upper Method for Minimum Maximum Confidence Confidence Confidence Resampled Resampled Number of Name Limit Level (%) Interval Estimate Estimate Resamples mean_diff 8.11452 95 Bootstrap Normal -2.4 9.1 2000
Percentile Confidence Interval
Approximate Approximate Lower Upper Method for Observed Confidence Confidence Confidence Confidence Number of Name Statistic Limit Limit Level (%) Interval Resamples mean_diff 4.6 0.8 7.8 95 Bootstrap PCTL 2000
Improved Bootstrap Confidence Interval
Approximate Approximate Lower Upper Method for Observed Confidence Confidence Confidence Confidence Name Statistic Limit Limit Level (%) Interval mean_diff 4.6 -0.1 7.4 95 Bootstrap BCa Lower Upper Bias Number of Percentile Percentile Correction Name Resamples Point Point (Z0) Acceleration mean_diff 2000 .009875210 0.95183 -0.056429 -0.056302
Bootstrapping regression using data file duncan. The results below are different from Table 16.5 since we only use the ordinary regression procedure instead of Huber robust regression.
%macro analyze(data=, out=); proc reg noprint data=&data outest=&out (drop= prestige _IN_ _P_ _EDF_ _RMSE_); model prestige=income educ; %bystmt; run; %mend; %boot(data=duncan, samples=2000); %bootci(PCTL); %bootci(BC); %bootci(BCa);
Improved Bootstrap Confidence Interval
Frequency | ** | ** ** 300 + ** ** | ** ** | ** ** | ** ** ** | ** ** ** 250 + ** ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** ** 200 + ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** ** 150 + ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** 100 + ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** 50 + ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ----------------------------------------------------------------------------------- - - - - - - 1 1 1 1 1 1 - - - - - - - - 6 5 4 3 2 0 9 8 7 6 4 3 2 1 0 1 2 3 4 6 . . . . . . . . . . . . . . . . . . . . 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 2 4 6 8 0 Intercept
Improved Bootstrap Confidence Interval
Frequency 300 + ** | ** ** | ** ** | ** ** ** 270 + ** ** ** | ** ** ** | ** ** ** | ** ** ** 240 + ** ** ** | ** ** ** | ** ** ** ** | ** ** ** ** ** 210 + ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** 180 + ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** 150 + ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** 120 + ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** 90 + ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** 60 + ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** 30 + ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** --------------------------------------------------------------------------------------- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 3 4 4 5 6 6 7 7 8 9 9 0 0 1 2 2 3 2 8 4 0 6 2 8 4 0 6 2 8 4 0 6 2 8 4 0 6 2 income Midpoint
Improved Bootstrap Confidence Interval
Frequency | ** | ** 350 + ** | ** | ** | ** ** | ** ** 300 + ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** 250 + ** ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** | ** ** ** ** 200 + ** ** ** ** | ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** 150 + ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** | ** ** ** ** ** ** ** 100 + ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** 50 + ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** | ** ** ** ** ** ** ** ** ** ** ** ** ** ** ------------------------------------------------------------------------------- - - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . . . . . . . . . . . . . . . . . 1 0 0 0 0 1 2 2 3 3 4 5 5 6 6 7 8 8 9 5 9 3 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 educ Midpoint
Improved Bootstrap Confidence Interval
Approximate Approximate Approximate Lower Upper Observed Bootstrap Approximate Standard Confidence Bias-Corrected Confidence Name Statistic Mean Bias Error Limit Statistic Limit Intercept -6.06466 -6.07837 -0.013709 3.06968 -12.0674 -6.05095 -0.03448 educ 0.54583 0.53178 -0.014057 0.13606 0.2932 0.55989 0.82657 income 0.59873 0.61524 0.016504 0.16511 0.2586 0.58223 0.90584 Method for Minimum Maximum LABEL OF Confidence Confidence Resampled Resampled Number of FORMER Name Level (%) Interval Estimate Estimate Resamples VARIABLE Intercept 95 Bootstrap Normal -16.7532 6.23388 2000 Intercept educ 95 Bootstrap Normal -0.1439 0.93903 2000 income 95 Bootstrap Normal 0.1101 1.32727 2000
Improved Bootstrap Confidence Interval
Approximate Approximate Lower Upper Method for LABEL OF Observed Confidence Confidence Confidence Confidence Number of FORMER Name Statistic Limit Limit Level (%) Interval Resamples VARIABLE Intercept -6.06466 -12.1006 0.04230 95 Bootstrap PCTL 2000 Intercept educ 0.54583 0.2458 0.78030 95 Bootstrap PCTL 2000 income 0.59873 0.3113 0.95969 95 Bootstrap PCTL 2000
Improved Bootstrap Confidence Interval
Approximate Approximate Lower Upper Method for Observed Confidence Confidence Confidence Confidence Name Statistic Limit Limit Level (%) Interval Intercept -6.06466 -12.1006 0.04230 95 Bootstrap BC educ 0.54583 0.2603 0.78737 95 Bootstrap BC income 0.59873 0.2963 0.94522 95 Bootstrap BC LABEL OF Lower Upper Bias Number of FORMER Percentile Percentile Correction Name Resamples VARIABLE Point Point (Z0) Intercept 2000 Intercept 0.025000 0.97500 -0.000000 educ 2000 0.030589 0.97971 0.043880 income 2000 0.019567 0.96835 -0.051409
Improved Bootstrap Confidence Interval
Estimated Estimated Estimated Lower Upper Observed Jackknife Estimated Standard Confidence Bias-Corrected Confidence Name Statistic Mean Bias Error Limit Statistic Limit Intercept -6.06466 -6.06599 -0.058182 3.11843 -12.1185 -6.00648 0.10552 educ 0.54583 0.54549 -0.014920 0.14778 0.2711 0.56075 0.85040 income 0.59873 0.59914 0.017827 0.18140 0.2254 0.58091 0.93644 Method for Minimum Maximum LABEL OF Confidence Confidence Resampled Resampled Number of FORMER Name Level (%) Interval Estimate Estimate Resamples VARIABLE Intercept 95 Jackknife -7.34563 -5.24007 45 Intercept educ 95 Jackknife 0.43303 0.58637 45 income 95 Jackknife 0.54404 0.73155 45
Improved Bootstrap Confidence Interval
Approximate Approximate Lower Upper Method for Observed Confidence Confidence Confidence Confidence Number of Name Statistic Limit Limit Level (%) Interval Resamples Intercept -6.06466 -11.7872 0.45600 95 Bootstrap BCa 2000 educ 0.54583 0.3030 0.82933 95 Bootstrap BCa 2000 income 0.59873 0.2519 0.90165 95 Bootstrap BCa 2000 LABEL OF Lower Upper Bias FORMER Percentile Percentile Correction Name VARIABLE Point Point (Z0) Acceleration Intercept Intercept 0.028650 0.97845 -0.000000 0.015823 educ 0.052622 0.99235 0.043880 0.079128 income 0.007763 0.94720 -0.051409 -0.074962