next up previous
Next: Cross Validation Up: Lectures Previous: More about the theoretical

Subsections

The jackknife

A little history, the first idea related to the bootstrap was Von-Mises, who used the plug-in principle in the 1930's. Then in the late 50's Quenouille found a way of correcting the bias for estimators whose bias was known to be of the form:

\begin{displaymath}
(*) \qquad \frac{a_1}{n}+\frac{a_2}{n^2}+\frac{a_3}{n^4}+\cdots
\end{displaymath}

Even if the numerators are unkowns depending on the real distribution $F$.

This method was the jackknife.

This is the first time that the sample was manipulated, each observation is dropped once from the sample, when the $ith$ observation has been dropped, we call the estimator $\hat{\theta}_{(i)}=s(\mbox{${\cal X}$}_{(i)})$ and their average is $\hat{\theta}_{(.)}=\frac{1}{n}
\sum_{i=1}^n \hat{\theta}_{(i)}$

\begin{displaymath}
\mathbf{\mbox{${\cal X}$}}_{(i)} = (x_1, x_2, \ldots x_{i-1}, x_{i+1}, \ldots x_n)
\end{displaymath}

We can use the jackknife for estimating the bias by:

\begin{displaymath}\widehat{Bias}_{\mbox{jack}} = (n-1)
(\hat{\theta}_{(\cdot)}-\hat{\theta})
\end{displaymath}

We showed in class that if the bias is of the order $\frac{1}{n}$ as in $(*)$ then the Jackknife estimate is in $\frac{1}{n^2}$

We can show that if the bias is of the order $\frac{1}{n}$ as in $(*)$ then the Jackknife estimate is in $\frac{1}{n^2}$

\begin{displaymath}E_F \hat{\theta}_{(\cdot)}=E_{n-1}=\theta+\frac{a_1(F)}{n-1}+\frac{a_2(F)}
{(n-1)^2}+\cdots \end{displaymath}


\begin{displaymath}\mbox{The Jackknife estimate is:}\qquad \tilde{\theta}=
\hat{...
...dot)}-\hat{\theta})
=n\hat{\theta}-(n-1)\hat{\theta}_{(\cdot)}
\end{displaymath}


\begin{displaymath}E_F \tilde{\theta}= E_F(
\hat{\theta}-\widehat{Bias}_{\mbox{j...
...2(F)}{n(n-1)}+{a_3(F)}(\frac{1}{n^2}+
\frac{1}{(n-1)^2}+\cdots \end{displaymath}

So the order of the bias has been decreased from $O(\frac{1}{n})$ to $O(\frac{1}{n^2})$ Example:
Suppose $\theta=var(X)=\int_{supp(F)} (x-\mu)^2dF$ for which the plug in estimate is:

\begin{displaymath}\hat{\theta}=\int_{supp(F_n)} (x-\mu(F_n))^2dF_n
=\frac{1}{n}\sum (x_i-\bar{x})^2\end{displaymath}


\begin{displaymath}
\hat{\sigma^2}^{JK}= \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2
\end{displaymath}

Example: Patch Data

We are interested in the parameter

\begin{displaymath}\theta=
\frac{\vert E(\mbox{new patch}-\mbox{old patch})\vert}{E(\mbox{old patch}-\mbox{placebo})}
\end{displaymath}

Jackknife$z=$old $-$placebo, $y=$ new $-$ old.
MATLAB Note
 FEVAL Execute function specified by string.
    If F is a string containing the name of a function (usually
    defined by an M-file), then  FEVAL(F,x1,...,xn)  evaluates
    that function at the given arguments.  
    For example, F = 'foo', FEVAL(F,9.64) is the same as foo(9.64).
    FEVAL is usually used inside functions which have the names of
    other functions as arguments.  Examples include FZERO and EZPLOT.
    [y1,..,yn] = FEVAL(F,x1,...,xn) returns multiple output arguments.
    Within methods that overload built-in functions, use
    BUILTIN(FUN,...) to execute the original built-in function.
    See also BUILTIN.

%------------------------------------------
function out=jackbias(theta,orig)
%Estimate the bias using the jackknife
%Theta has to be a character string containg
% a valid function name
[n,p]=size(orig);
lot=feval(theta,orig(2:n,:));
k=length(lot);
lo=zeros(n,k);
lo(1,:)=lot;
lo(n,:)=feval(theta,orig(1:(n-1),:));
for i=(2:(n-1))
   lo(i,:)=feval(theta,orig([1:(i-1),(i+1):n],:)); 
end
thetadot=mean(lo);
out=(n-1)*thetadot -(n-1)*feval(theta,orig);
%-------------------------------
function out=ratio(yszs)
%Computes the ratio of mean of first 
%column of mean of second column
out=mean(yszs(:,1))/mean(yszs(:,2));
%-------------------------------------
>>z=oldpatch-placebo ; y=newpatch-oldpatch
>> yz=[y,z]
       -1200        8406
        2601        2342
       -2705        8187
        1982        8459
       -1290        4795
         351        3516
        -638        4796
       -2719       10238
>>ratio(yz)
   -0.0713
>> jackbias('ratio',yz)
ans =    0.0080

Bootstrap Simulations

function out=bootbias(theta,orig,B)
thetabs=zeros(1,B);
for b=(1:B)
    bs=bsample(orig);
    thetabs(b)=feval(theta,bs);
end
theta0=feval(theta,orig);
out=mean(thetabs)-theta0;
%-----------------------------------
>> bootbias('ratio',yz,1000)
ans =    0.0085
%-----------------------------------
function out=bootmultin(orig,B)
[n,p]=size(orig);
out=zeros(B,n);
for b=1:B
inds=randint(1,n,n)+1;
for i=1:n
 out(b,inds(i))=out(b,inds(i))+1;
end;
end
%-----------------------------------
>> bootmultin(law15,5)
1  1  2  0  1  1  0   1   3  0  2  2  1  0  0
3  1  0  1  2  2  0   1   2  2  0  0  0  0  1
0  0  0  3  0  4  3   0   2  0  1  0  0  1  1
1  1  2  2  0  1  1   0   2  1  1  1  0  0  2
2  0  0  2  2  0  0   0   0  1  3  3  0  2  0


next up previous
Next: Cross Validation Up: Lectures Previous: More about the theoretical
Susan Holmes 2004-05-19