Applied Regression Analysis by John Fox Chapter 16: Assessing Sampling Variation: Bootstrapping and Cross-Validation

First example on income of four married couples, Table 16.1.

data temp; /*creating Table 16.2 */
  array values{4} (6 -3 5 3);
  array y{4} y1-y4;
  do i =1 to 4;
    y1=values[i];
    do j=1 to 4;
      y2=values[j];
      do k=1 to 4;
	y3=values[k];
	do l=1 to 4;
	  y4=values[l];
	  output;
	end;
      end;
    end;
  end;
  drop i j k l values1-values4;
run;

data stat; /*creaing mean for each sample*/
  set temp;
  m=mean(of y1-y4);
run;

proc means data=stat vardef=N; /*Bootstrapping Means, known distribution*/
 var m;
run;

The MEANS Procedure

                       Analysis Variable : m

  N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------
256       2.7500000       1.7455300      -3.0000000       6.0000000

proc univariate data=stat noprint;
  histogram m / midpoints=-3.8 to 8 by .4 href=2.75 lhref=1
  haxis=axis1 cfill=green;
  label m='Boostrap Mean';
run;

Image chp16Fig1

Second example of 10 married couples based on Table 16.3. First we run proc means to show the sample mean, standard deviation and normal-theory 95% confidence interval. Then we run the SAS macro boot to get percentile interval and improved boostrap interval for the mean. Macro boot is a part of program jackboot. In order to run boot, we need first create a macro called analyze in which we specify which data set we are going use and which statistic we want to analyze. After running boot, we can further run bootci to get estimate on confidence intervals using different methods. By default, macro boot also produces histogram on bootstrapping replicates of the mean.

data couples;
 input husinc wifinc @ diff;
  diff=husinc-wifinc;
cards;
 24 18
 14 17
 40 35
 44 41
 24 18
 19 9
 21 10
 22 30
 30 23
 24 15
 ;
run;
proc means data=couples mean stddev clm alpha=0.05;
var diff;
run;

The MEANS Procedure

  	Analysis Variable : diff
  	
  	                            Lower 95%       Upper 95%
     Mean         Std Dev     CL for Mean     CL for Mean
-----------------------------------------------------------
4.6000000       5.9479221       0.3451128       8.8548872
-----------------------------------------------------------

%include 'jackboot.sas';
%macro analyze(data=, out=);
proc means noprint data=&data;
    output out=&out (drop=_freq_ _type_) mean=mean_diff;
    var diff;
    %bystmt;
  run;
%mend;

title2 'Normal Confidence Interval';
%boot(data=couples, samples=2000); /*normal-theory C-I given here*/
title2 'Percentile Confidence Interval';
%bootci(PCTL);/*Percentile Intervals*/
title2 'Improved Bootstrap Confidence Interval';
%bootci(BCa); /*improved Bootstrap Intervals*/

Normal Confidence Interval

Frequency

    |                                                  **
    |                                                  **  **
    |                                                  **  **
240 +                                                  **  **  **
    |                                                  **  **  **
    |                                              **  **  **  **
    |                                              **  **  **  **
210 +                                              **  **  **  **
    |                                          **  **  **  **  **
    |                                          **  **  **  **  **
    |                                          **  **  **  **  **
180 +                                          **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **  **
150 +                                      **  **  **  **  **  **  **
    |                                      **  **  **  **  **  **  **
    |                                      **  **  **  **  **  **  **
    |                                      **  **  **  **  **  **  **
120 +                                  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **  **
 90 +                                  **  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **  **
    |                              **  **  **  **  **  **  **  **  **  **
 60 +                              **  **  **  **  **  **  **  **  **  **  **
    |                              **  **  **  **  **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **  **  **  **  **  **
 30 +                          **  **  **  **  **  **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **
    -----------------------------------------------------------------------------------
        -   -   -   -
        2   1   1   0   0   0   1   1   2   3   3   4   4   5   6   6   7   7   8   9
        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
        4   8   2   6   0   6   2   8   4   0   6   2   8   4   0   6   2   8   4   0

                                     mean_diff Midpoint

Normal Confidence Interval

                                                                Approximate
                                                  Approximate      Lower
             Observed   Bootstrap   Approximate     Standard     Confidence   Bias-Corrected
  Name      Statistic      Mean         Bias         Error         Limit         Statistic

mean_diff      4.6         4.6       6.2172E-15     1.79316       1.08548           4.6

                Approximate
               Upper                      Method for       Minimum     Maximum
             Confidence   Confidence      Confidence      Resampled   Resampled   Number of
  Name         Limit       Level (%)       Interval        Estimate    Estimate   Resamples

mean_diff     8.11452         95       Bootstrap Normal      -2.4        9.1         2000

Percentile Confidence Interval

                        Approximate   Approximate
                           Lower         Upper                     Method for
             Observed    Confidence    Confidence   Confidence     Confidence     Number of
  Name      Statistic      Limit         Limit       Level (%)      Interval      Resamples

mean_diff      4.6          0.8           7.8           95       Bootstrap PCTL      2000

Improved Bootstrap Confidence Interval

                          Approximate    Approximate
                             Lower          Upper                      Method for
              Observed     Confidence     Confidence    Confidence     Confidence
  Name       Statistic       Limit          Limit        Level (%)      Interval

mean_diff       4.6           -0.1           7.4            95        Bootstrap BCa

                             Lower         Upper         Bias
             Number of    Percentile    Percentile    Correction
  Name       Resamples       Point         Point         (Z0)       Acceleration

mean_diff       2000      .009875210      0.95183      -0.056429      -0.056302

Bootstrapping regression using data file duncan. The results below are different from Table 16.5 since we only use the ordinary regression procedure instead of Huber robust regression.

%macro analyze(data=, out=);
proc reg  noprint data=&data outest=&out (drop= prestige _IN_ _P_ _EDF_ _RMSE_);
    model prestige=income educ;
    %bystmt;
  run;
%mend; 
%boot(data=duncan, samples=2000);
%bootci(PCTL);
%bootci(BC);
%bootci(BCa);

Improved Bootstrap Confidence Interval

Frequency

    |                                      **
    |                                      **  **
300 +                                      **  **
    |                                      **  **
    |                                      **  **
    |                                  **  **  **
    |                                  **  **  **
250 +                              **  **  **  **
    |                              **  **  **  **
    |                              **  **  **  **
    |                              **  **  **  **
    |                              **  **  **  **  **
200 +                              **  **  **  **  **
    |                              **  **  **  **  **
    |                              **  **  **  **  **
    |                              **  **  **  **  **
    |                          **  **  **  **  **  **
150 +                          **  **  **  **  **  **
    |                          **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **
100 +                          **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **  **
 50 +                      **  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **  **
    |          **  **  **  **  **  **  **  **  **  **  **  **  **  **  **
    -----------------------------------------------------------------------------------
        -   -   -   -   -   -
        1   1   1   1   1   1   -   -   -   -   -   -   -   -
        6   5   4   3   2   0   9   8   7   6   4   3   2   1   0   1   2   3   4   6
        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
        8   6   4   2   0   8   6   4   2   0   8   6   4   2   0   2   4   6   8   0

                                     Intercept

Improved Bootstrap Confidence Interval

Frequency

300 +                                  **
    |                              **  **
    |                              **  **
    |                              **  **  **
270 +                              **  **  **
    |                              **  **  **
    |                              **  **  **
    |                              **  **  **
240 +                              **  **  **
    |                              **  **  **
    |                              **  **  **  **
    |                          **  **  **  **  **
210 +                          **  **  **  **  **
    |                          **  **  **  **  **
    |                          **  **  **  **  **
    |                          **  **  **  **  **
180 +                          **  **  **  **  **
    |                          **  **  **  **  **
    |                          **  **  **  **  **
    |                          **  **  **  **  **
150 +                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **
120 +                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **
    |                      **  **  **  **  **  **  **  **
 90 +                      **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **
 60 +                  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **
 30 +              **  **  **  **  **  **  **  **  **  **  **  **
    |              **  **  **  **  **  **  **  **  **  **  **  **
    |          **  **  **  **  **  **  **  **  **  **  **  **  **  **
    |      **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **  **
    ---------------------------------------------------------------------------------------
        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   1
        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
        1   1   2   3   3   4   4   5   6   6   7   7   8   9   9   0   0   1   2   2   3
        2   8   4   0   6   2   8   4   0   6   2   8   4   0   6   2   8   4   0   6   2

                                        income Midpoint

Improved Bootstrap Confidence Interval

Frequency

    |                                                  **
    |                                                  **
350 +                                                  **
    |                                                  **
    |                                                  **
    |                                              **  **
    |                                              **  **
300 +                                              **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
250 +                                          **  **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
    |                                          **  **  **  **
200 +                                          **  **  **  **
    |                                      **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
150 +                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                      **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **
100 +                                  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **
    |                                  **  **  **  **  **  **  **  **
 50 +                              **  **  **  **  **  **  **  **  **
    |                              **  **  **  **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **  **  **  **  **
    |                          **  **  **  **  **  **  **  **  **  **  **
    |                  **  **  **  **  **  **  **  **  **  **  **  **  **  **
    -------------------------------------------------------------------------------
        -   -   -
        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
        1   0   0   0   0   1   2   2   3   3   4   5   5   6   6   7   8   8   9
        5   9   3   3   9   5   1   7   3   9   5   1   7   3   9   5   1   7   3

                                     educ Midpoint

Improved Bootstrap Confidence Interval

                                                      Approximate                Approximate
                                          Approximate    Lower                      Upper
           Observed Bootstrap Approximate   Standard   Confidence Bias-Corrected  Confidence
Name      Statistic    Mean       Bias       Error       Limit       Statistic      Limit

Intercept  -6.06466  -6.07837  -0.013709    3.06968     -12.0674     -6.05095      -0.03448
educ        0.54583   0.53178  -0.014057    0.13606       0.2932      0.55989       0.82657
income      0.59873   0.61524   0.016504    0.16511       0.2586      0.58223       0.90584

                           Method for        Minimum      Maximum                  LABEL OF
          Confidence       Confidence       Resampled    Resampled    Number of     FORMER
Name       Level (%)        Interval         Estimate     Estimate    Resamples    VARIABLE

Intercept     95        Bootstrap Normal     -16.7532     6.23388        2000      Intercept
educ          95        Bootstrap Normal      -0.1439     0.93903        2000
income        95        Bootstrap Normal       0.1101     1.32727        2000

Improved Bootstrap Confidence Interval

                      Approximate  Approximate
                         Lower        Upper                   Method for               LABEL OF
            Observed   Confidence   Confidence  Confidence    Confidence    Number of   FORMER
Name       Statistic     Limit        Limit      Level (%)     Interval     Resamples  VARIABLE

Intercept   -6.06466    -12.1006     0.04230        95      Bootstrap PCTL     2000    Intercept
educ         0.54583      0.2458     0.78030        95      Bootstrap PCTL     2000
income       0.59873      0.3113     0.95969        95      Bootstrap PCTL     2000

Improved Bootstrap Confidence Interval

                          Approximate    Approximate
                             Lower          Upper                      Method for
              Observed     Confidence     Confidence    Confidence     Confidence
Name         Statistic       Limit          Limit        Level (%)      Interval

Intercept     -6.06466      -12.1006       0.04230          95        Bootstrap BC
educ           0.54583        0.2603       0.78737          95        Bootstrap BC
income         0.59873        0.2963       0.94522          95        Bootstrap BC

                          LABEL OF        Lower         Upper         Bias
             Number of     FORMER      Percentile    Percentile    Correction
Name         Resamples    VARIABLE        Point         Point         (Z0)

Intercept       2000      Intercept     0.025000       0.97500      -0.000000
educ            2000                    0.030589       0.97971       0.043880
income          2000                    0.019567       0.96835      -0.051409

Improved Bootstrap Confidence Interval

                                                        Estimated                   Estimated
                                            Estimated     Lower                       Upper
            Observed  Jackknife  Estimated   Standard  Confidence  Bias-Corrected  Confidence
Name       Statistic     Mean       Bias      Error       Limit       Statistic       Limit

Intercept   -6.06466   -6.06599  -0.058182   3.11843    -12.1185      -6.00648       0.10552
educ         0.54583    0.54549  -0.014920   0.14778      0.2711       0.56075       0.85040
income       0.59873    0.59914   0.017827   0.18140      0.2254       0.58091       0.93644

                         Method for     Minimum      Maximum                  LABEL OF
           Confidence    Confidence    Resampled    Resampled    Number of     FORMER
Name        Level (%)     Interval      Estimate     Estimate    Resamples    VARIABLE

Intercept      95        Jackknife      -7.34563     -5.24007        45       Intercept
educ           95        Jackknife       0.43303      0.58637        45
income         95        Jackknife       0.54404      0.73155        45

Improved Bootstrap Confidence Interval

                          Approximate    Approximate
                             Lower          Upper                      Method for
              Observed     Confidence     Confidence    Confidence     Confidence      Number of
Name         Statistic       Limit          Limit        Level (%)      Interval       Resamples

Intercept     -6.06466      -11.7872       0.45600          95        Bootstrap BCa       2000
educ           0.54583        0.3030       0.82933          95        Bootstrap BCa       2000
income         0.59873        0.2519       0.90165          95        Bootstrap BCa       2000

             LABEL OF        Lower         Upper         Bias
              FORMER      Percentile    Percentile    Correction
Name         VARIABLE        Point         Point         (Z0)       Acceleration

Intercept    Intercept     0.028650       0.97845      -0.000000       0.015823
educ                       0.052622       0.99235       0.043880       0.079128
income                     0.007763       0.94720      -0.051409      -0.074962