bootstrapping may not work very well with small sample sizes. So, we take a data file /stata/code/sim/welfsub.dta and treat that as our population. We then run 1000 simulations where we use bootstrapping to get a confidence interval for the mean. Then, we assess how many of these actually contain the population mean (which is approximately 112). This program /stata/code/sim/bssim.ado.txt performs the bootstrapping for a single iteration and is written to work with the simul command.
program define bssim version 6.0 if "`1'" == "?" { global S_1 "lb_stan ub_stan lb_n ub_n lb_p ub_p lb_bc ub_bc" exit } use welfsub, clear gen x = uniform() sort x keep if _n <= `2' ci agrc local ub_stan = `r(ub)' local lb_stan = `r(lb)' bs "summarize agrc" "r(mean)", reps(1000) post `1' `lb_stan' `ub_stan' r(lb_n) r(ub_n) r(lb_p) r(ub_p) r(lb_bc) r(ub_bc) end
Then, this program https://stats.idre.ucla.edu/wp-content/uploads/2016/02/bsvaryn.do_.txt performs the simulations with samples of size 10, 20, 30, 40, 50, and 100.
simul bssim, reps(1000) args(10) saving(sim10) replace dots simul bssim, reps(1000) args(20) saving(sim20) replace dots simul bssim, reps(1000) args(30) saving(sim30) replace dots simul bssim, reps(1000) args(40) saving(sim40) replace dots simul bssim, reps(1000) args(50) saving(sim50) replace dots simul bssim, reps(1000) args(100) saving(sim100) replace dots
This program https://stats.idre.ucla.edu/wp-content/uploads/2016/02/count.do_.txt counts how many of the intervals contain the mean, using 4 different types of confidence intervals (see bs for info on the types of confidence intervals formed by bs.
use sim10 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112 use sim20 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112 use sim30 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112 use sim40 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112 use sim50 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112 use sim100 count if lb_stan < 112 & ub_stan > 112 count if lb_n < 112 & ub_n > 112 count if lb_p < 112 & ub_p > 112 count if lb_bc < 112 & ub_bc > 112
Here are the results we got when we ran this. As you see, the small samples (e.g. n=10) only captured the mean about 72% to 78% of the time (when it should have done so about 95% of the time.
use sim10 count if lb_stan < 112 & ub_stan > 112 739 count if lb_n < 112 & ub_n > 112 721 count if lb_p < 112 & ub_p > 112 723 count if lb_bc < 112 & ub_bc > 112 778 use sim20 count if lb_stan < 112 & ub_stan > 112 841 count if lb_n < 112 & ub_n > 112 820 count if lb_p < 112 & ub_p > 112 836 count if lb_bc < 112 & ub_bc > 112 882 use sim30 count if lb_stan < 112 & ub_stan > 112 874 count if lb_n < 112 & ub_n > 112 864 count if lb_p < 112 & ub_p > 112 878 count if lb_bc < 112 & ub_bc > 112 909 use sim40 count if lb_stan < 112 & ub_stan > 112 884 count if lb_n < 112 & ub_n > 112 878 count if lb_p < 112 & ub_p > 112 887 count if lb_bc < 112 & ub_bc > 112 903 use sim50 count if lb_stan < 112 & ub_stan > 112 911 count if lb_n < 112 & ub_n > 112 905 count if lb_p < 112 & ub_p > 112 913 count if lb_bc < 112 & ub_bc > 112 922 use sim100 count if lb_stan < 112 & ub_stan > 112 909 count if lb_n < 112 & ub_n > 112 907 count if lb_p < 112 & ub_p > 112 911 count if lb_bc < 112 & ub_bc > 112 921