How can I generate bootstrap statistics in R?

The R package boot allows a user to easily generate bootstrap samples of virtually any statistic that they can calculate in R. From these samples, you can generate estimates of bias, bootstrap confidence intervals, or plots of your bootstrap replicates. We will demonstrate a few of these techniques in this page and you can read more details at its CRAN package page. Before using commands in the boot package, you must first download the package and load it in your workspace. We will be using the hsb2 dataset for all of the examples on this page.

install.packages("boot",dep=TRUE)
library(boot)
hsb2 <- read.table("https://stats.idre.ucla.edu/stat/data/hsb2.csv", sep=",", header=T)

Using the boot command

The boot command executes the resampling of your dataset and calculation of your statistic(s) of interest on these samples. Before calling boot, you need to define a function that will return the statistic(s) that you would like to bootstrap. The first argument passed to the function should be your dataset. The second argument can be an index vector of the observations in your dataset to use or a frequency or weight vector that informs the sampling probabilities. The example below uses the default index vector and assumes we wish to use all of our observations. The statistic of interest here is the correlation coefficient of write and math.

fc <- function(d, i){
	d2 <- d[i,]
	return(cor(d2$write, d2$math))
}

With the function fc defined, we can use the boot command, providing our dataset name, our function, and the number of bootstrap samples to be drawn.

#turn off set.seed() if you want the results to vary
set.seed(626)
bootcorr <- boot(hsb2, fc, R=500)
bootcorr

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = hsb2, statistic = fc, R = 500)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.6174493 -0.001528707  0.04020362

While the printed output for bootcorr is brief, R saves additional information that can be listed:

summary(bootcorr)

          Length Class      Mode     
t0          1    -none-     numeric  
t         500    -none-     numeric  
R           1    -none-     numeric  
data       11    data.frame list     
seed      626    -none-     numeric  
statistic   1    -none-     function 
sim         1    -none-     character
call        4    -none-     call     
stype       1    -none-     character
strata    200    -none-     numeric  
weights   200    -none-     numeric

Knowing the seed value would allow us to replicate this analysis, if needed, and from the t vector and t0, we could calculate the bias and standard error:

mean(bootcorr$t) - bootcorr$t0

[1] -0.001528707

sd(bootcorr$t)

[1] 0.04020362

For using other commands in the boot package, you will often need to provide a “boot” object:

class(bootcorr)
[1] "boot"

Bootstrap confidence intervals and plots

To look at a histogram and normal quantile-quantile plot of your bootstrap estimates, you can use plot with the “boot” object you created. The histogram includes a dotted vertical line indicating the location of the original statistic.

plot(bootcorr)

Using the boot.ci command, you can generate several types of confidence intervals from your bootstrap samples.


boot.ci(boot.out = bootcorr, type = c("norm", "basic", "perc", "bca"))

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL : 
boot.ci(boot.out = bootcorr, type = c("norm", "basic", "perc", 
    "bca"))

Intervals : 
Level      Normal              Basic         
95%   ( 0.5402,  0.6978 )   ( 0.5406,  0.7063 )  

Level     Percentile            BCa          
95%   ( 0.5286,  0.6943 )   ( 0.5291,  0.6946 )  
Calculations and Intervals on Original Scale

Four 95% confidence intervals are presented: normal, basic, percentile, and bias-corrected and accelerated. A fifth type, the studentized intervals, requires variances from each bootstrap sample.