next up previous
Next: Some notation Up: Lectures Previous: The questions addressed

Subsections

The bootstrap: Some Examples

A binomial example

Suppose we don't know any probability or statistics, and we are told that
  Heart attacks   Subjects
Aspirin 104 10933 11037
Placebo 189 10845 11034

The question : Is the true ratio $\theta$ of heart attack rates, the parameter of interest, smaller than 1?

Or is the difference between rates in both populations strictly different from zero?

We could run a simulation experiment to find out.

>> [zeros(1,10) ones(1,2)]
ans =
     0     0     0     0     0     0     0     0     0     0     1     1
>> [zeros(1,10) ones(1,2)]'
ans =
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     1
>> [zeros(1,10) ; ones(1,2)]
???  All rows in the bracketed expression must have the same 
number of columns.
>> sample1=[zeros(1,109) ones(1,1)]';
>> sample2=[zeros(1,108) ones(1,2)]';
>> orig=sample1
>>[n,p]=size(orig)
n =     110
p =    1
>> thetab=zeros(1,1000);

File bsample.m:

function out=bsample(orig)
%Function to create one resample from
%the original sample orig, where
%orig is the original data, and is a 
%matrix with nrow observations and ncol variables

      [n,p]=size(orig);
      indices=randint(1,n,n)+1;
      out=orig(indices,:);

Computing the bootstrap distribution for one sample

>> for (b =(1:1000))
      res.bsample1=bsample(sample1);
      thetab(b)=sum(res.bsample1==1);  
 end
>>hist(thetab)
This is what the histogram looks like:
Here is the complete data set computation
>> sample1=[zeros(1,10933),ones(1,104)]';
>> sample2=[zeros(1,10845),ones(1,189)]';
>> thetab=zeros(1,1000);
>> for (b =(1:1000))
      res.bsample1=bsample(sample1);
      thetab(b)=sum(res.bsample1==1);  
 end

Comparing to a hypothetical parameter

Suppose that we are trying to test $\theta=\frac{2}{110}$
>> for (b =(1:1000))
      res.bsample1=bsample(sample1);
      thetab(b)=sum(res.resample1==1)-2;  
 end
>>hist(thetab)
>> mean(thetab)
ans =
   -0.9530
>> var(thetab)
ans =
    1.0398
>> sum(thetab>0)
ans =
    96
This is what the histogram looks like:

Computing the bootstrap distribution for two samples

>>thetab=zeros(1,1000);
>> for (b =(1:1000))
      res.bsample1=bsample(sample1);
      res.bsample2=bsample(sample2);
      thetab(b)=sum(res.bsample2==1)-sum(res.bsample1==1);  
 end
>>hist(thetab)
This is what the histogram looks like:

Without the computer

Sample one could be considered as the realization of a Binomial random variable, from some unkown Binomial distribution, for which the best estimate given by maximum likelihood would be:

\begin{displaymath}B(n_1,p_1), \hat{p_1}=\frac{\sum X_i}{n}=\frac{104}{11037}\end{displaymath}

The second sample would be considered as coming from another Binomial, in the most general case

\begin{displaymath}B(n_2,p_2), \hat{p_2}=\frac{\sum X_i}{n}=\frac{189}{11034}\end{displaymath}

Theoretically, what can we say abou the distribution of $\hat{p_1} - \hat{p_2}$?

How good would the Normal approximation to $X_1$ be?

Here is an answer, this is NOT a simulation experiment but the comparison of the exact probability mass functions for the binomial and the relevant Normal approximation.

>> x=0:180;
>> y=binopdf(x,11037,104/11037);
>> plot(x,y,'+');
 s=sqrt(104*((11037-104)/11037))
s =
   10.1499
 hold on;
 z=normpdf(x,104,s);
 plot(x,z,'g-')
 text(20,.03,'+ Binomial(11037,104/11037)','FontSize',13,'Color',)
 text(20,.025,'-- Normal(104,s)','FontSize',13,'Color',)           
 title('Aspirin Group','FontSize',13)
hold off;

This is what the comparison looks like:
\\

First plug-in encounter

It is known that for $X \sim \B(n,p)$, the sampling distribution of $\bar{x}=\hat{p}$ will be $\N(p,\frac{pq}{n})$. So that the standard error of $\hat{p}$ depends on the unkown parameters, in order to estimate the standard error of $\hat{p}$, it is a well known procedure to replace it by its estimate obtaining:

\begin{displaymath}\widehat{se}(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
\end{displaymath}

This is the very first known occurrence of what Brad Efron coined as the Plug in principle, which is an essential component in the bootstrap idea.

It is interesting to look at the early paper of Efron's, the first bootstrap paper


next up previous
Next: Some notation Up: Lectures Previous: The questions addressed
Susan Holmes 2004-05-19