- A binomial example
- Computing the bootstrap distribution for one sample
- Comparing to a hypothetical parameter
- Computing the bootstrap distribution for two samples
- Without the computer

- First plug-in encounter

Heart attacks | Subjects | ||

Aspirin | 104 | 10933 | 11037 |

Placebo | 189 | 10845 | 11034 |

The question : Is the true ratio of heart attack rates, the parameter of interest, smaller than 1?

Or is the difference between rates in both populations strictly different from zero?

We could run a simulation experiment to find out.

>> [zeros(1,10) ones(1,2)] ans = 0 0 0 0 0 0 0 0 0 0 1 1 >> [zeros(1,10) ones(1,2)]' ans = 0 0 0 0 0 0 0 0 0 0 1 1 >> [zeros(1,10) ; ones(1,2)] ??? All rows in the bracketed expression must have the same number of columns. >> sample1=[zeros(1,109) ones(1,1)]'; >> sample2=[zeros(1,108) ones(1,2)]'; >> orig=sample1 >>[n,p]=size(orig) n = 110 p = 1 >> thetab=zeros(1,1000);

File bsample.m:

function out=bsample(orig) %Function to create one resample from %the original sample orig, where %orig is the original data, and is a %matrix with nrow observations and ncol variables [n,p]=size(orig); indices=randint(1,n,n)+1; out=orig(indices,:);

>> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.bsample1==1); end >>hist(thetab)This is what the histogram looks like:

Here is the complete data set computation

>> sample1=[zeros(1,10933),ones(1,104)]'; >> sample2=[zeros(1,10845),ones(1,189)]'; >> thetab=zeros(1,1000); >> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.bsample1==1); end

>> for (b =(1:1000)) res.bsample1=bsample(sample1); thetab(b)=sum(res.resample1==1)-2; end >>hist(thetab) >> mean(thetab) ans = -0.9530 >> var(thetab) ans = 1.0398 >> sum(thetab>0) ans = 96This is what the histogram looks like:

>>thetab=zeros(1,1000); >> for (b =(1:1000)) res.bsample1=bsample(sample1); res.bsample2=bsample(sample2); thetab(b)=sum(res.bsample2==1)-sum(res.bsample1==1); end >>hist(thetab)This is what the histogram looks like:

The second sample would be considered as coming from another Binomial, in the most general case

Theoretically, what can we say abou the distribution of ?

How good would the Normal approximation to be?

Here is an answer, this is NOT a simulation experiment but the comparison of the exact probability mass functions for the binomial and the relevant Normal approximation.

>> x=0:180; >> y=binopdf(x,11037,104/11037); >> plot(x,y,'+'); s=sqrt(104*((11037-104)/11037)) s = 10.1499 hold on; z=normpdf(x,104,s); plot(x,z,'g-') text(20,.03,'+ Binomial(11037,104/11037)','FontSize',13,'Color',) text(20,.025,'-- Normal(104,s)','FontSize',13,'Color',) title('Aspirin Group','FontSize',13) hold off;

This is what the comparison looks like:

\\

This is the very first known occurrence of what
Brad Efron coined as the *Plug in principle*,
which is an essential component in the bootstrap idea.

It is interesting to look at the early paper of Efron's, the first bootstrap paper