Next: Some notation Up: Lectures Previous: The questions addressed

Subsections

The bootstrap: Some Examples

A binomial example

Suppose we don't know any probability or statistics, and we are told that
 Heart attacks Subjects Aspirin 104 10933 11037 Placebo 189 10845 11034

The question : Is the true ratio of heart attack rates, the parameter of interest, smaller than 1?

Or is the difference between rates in both populations strictly different from zero?

We could run a simulation experiment to find out.

>> [zeros(1,10) ones(1,2)]
ans =
0     0     0     0     0     0     0     0     0     0     1     1
>> [zeros(1,10) ones(1,2)]'
ans =
0
0
0
0
0
0
0
0
0
0
1
1
>> [zeros(1,10) ; ones(1,2)]
???  All rows in the bracketed expression must have the same
number of columns.
>> sample1=[zeros(1,109) ones(1,1)]';
>> sample2=[zeros(1,108) ones(1,2)]';
>> orig=sample1
>>[n,p]=size(orig)
n =     110
p =    1
>> thetab=zeros(1,1000);


File bsample.m:

function out=bsample(orig)
%Function to create one resample from
%the original sample orig, where
%orig is the original data, and is a
%matrix with nrow observations and ncol variables

[n,p]=size(orig);
indices=randint(1,n,n)+1;
out=orig(indices,:);


Computing the bootstrap distribution for one sample

>> for (b =(1:1000))
res.bsample1=bsample(sample1);
thetab(b)=sum(res.bsample1==1);
end
>>hist(thetab)

This is what the histogram looks like:
Here is the complete data set computation
>> sample1=[zeros(1,10933),ones(1,104)]';
>> sample2=[zeros(1,10845),ones(1,189)]';
>> thetab=zeros(1,1000);
>> for (b =(1:1000))
res.bsample1=bsample(sample1);
thetab(b)=sum(res.bsample1==1);
end


Comparing to a hypothetical parameter

Suppose that we are trying to test
>> for (b =(1:1000))
res.bsample1=bsample(sample1);
thetab(b)=sum(res.resample1==1)-2;
end
>>hist(thetab)
>> mean(thetab)
ans =
-0.9530
>> var(thetab)
ans =
1.0398
>> sum(thetab>0)
ans =
96

This is what the histogram looks like:

Computing the bootstrap distribution for two samples

>>thetab=zeros(1,1000);
>> for (b =(1:1000))
res.bsample1=bsample(sample1);
res.bsample2=bsample(sample2);
thetab(b)=sum(res.bsample2==1)-sum(res.bsample1==1);
end
>>hist(thetab)

This is what the histogram looks like:

Without the computer

Sample one could be considered as the realization of a Binomial random variable, from some unkown Binomial distribution, for which the best estimate given by maximum likelihood would be:

The second sample would be considered as coming from another Binomial, in the most general case

Theoretically, what can we say abou the distribution of ?

How good would the Normal approximation to be?

Here is an answer, this is NOT a simulation experiment but the comparison of the exact probability mass functions for the binomial and the relevant Normal approximation.

>> x=0:180;
>> y=binopdf(x,11037,104/11037);
>> plot(x,y,'+');
s=sqrt(104*((11037-104)/11037))
s =
10.1499
hold on;
z=normpdf(x,104,s);
plot(x,z,'g-')
text(20,.03,'+ Binomial(11037,104/11037)','FontSize',13,'Color',)
text(20,.025,'-- Normal(104,s)','FontSize',13,'Color',)
title('Aspirin Group','FontSize',13)
hold off;


This is what the comparison looks like:
\\

First plug-in encounter

It is known that for , the sampling distribution of will be . So that the standard error of depends on the unkown parameters, in order to estimate the standard error of , it is a well known procedure to replace it by its estimate obtaining:

This is the very first known occurrence of what Brad Efron coined as the Plug in principle, which is an essential component in the bootstrap idea.

It is interesting to look at the early paper of Efron's, the first bootstrap paper

Next: Some notation Up: Lectures Previous: The questions addressed
Susan Holmes 2004-05-19