Up: Lectures Previous: Confidence Intervals

Subsections

# The Smoothed Bootstrap

We have seen how the parametric bootstrap and the nonparmaetric bootstrap differ by what is plugged into the statistical functional.

We want to estimate and we can use as an estimate either or . In fact there is an intermediary choice, that takes the empirical cdf and smooths it a little, then we use the smoothed empirical cdf denoted by and we plug it in.

This is especially useful when the bootstrap distribution is too discrete, mostly when the statistic is a quantile, the median as we saw in the mouse data analysis had that problem.

## Smoothing- a crash course

Suppose we have a bidimensional scatterplot we want to smooth, this could be a histogram or a regression type context, they are both of the same form. The simplest one to start one is when the abscissa, althou ordinal are discrete, such as ages rounded to decades. Then the y data appear along lines of the possible 's.

The crosses, which are the conditional averages are a smooth of the scatter plot is some way.

Now suppose that the x's could be all over the place, we window them and take local averages.

The extreme case is when you take the whole x axes, then there is only one average, if you want you draw a line through it.

When the window is the smallest there is NO smoothing.

Again we want something gentler so we reduce the window width, and only take local averages. If we choose to differentiate within a window the points that are close to the abscisse at which we want to estimate the value by averaging, we can use a kernel weighting function.

Points that are close are given high weights, points further away are given lighter weights, on the boundary of the window the points won't count.

The weighting function is such that the sum of all the weights is 1. With no difference between weights, they are uniform. In fact the weighting function can be a probability density and often we take a Normal one.

Here is a nice webpage on smoothing, with available matlab softare.

Curve Fitting Example, Efron & Tibshirani, 7.3

loess.m is available in the course directory & loess is a built-in function in Splus.

Matlab procedure for bootstrapping the loess curve.

#N is the number of bootstrap.
N=500;
predmat=zeros(N, 101);

datasize=size(cholo,1);

clf;
plot(cholo(:,1), cholo(:,2), '.');
hold on;

for i=1:N
xind=unidrnd(datasize, datasize,1);
x=cholo(xind,:);
predmat(i,:)=loess(x(:,1), x(:,2), (0:100), .3, 1);
plot((0:100), predmat(i,:), '-.');     #Plot a sample bootstrap curve.
end;

#Plot the 95\% pointwise confidence lines.
plot((0:100), prctile(predmat, 2.5), 'r-');
plot((0:100), prctile(predmat, 97.5), 'r-');

xlabel('Compliance');
ylabel('Improvement');
axis([-5, 105, -40, 120]);


## Smoothing for variance stabilization

Page 164-166 Algorithm:
1. Generate bootstrap samples and the bootstrap estimates .
• For each b, take bootstrap resamples and estimate the standard error.
2. Fit a smooth curve to the pairs to produce a smooth estimate of the function, we will call it .
3. Use as the variance stabilizing transformation. Find through numerical integration usually.
4. Compute with bootstrap resamples, a bootstrap t interval for . (SE approximately one, so no denominator).
5. Map back the endpoints of the interval through a transformation.

boott               package:bootstrap               R Documentation

Bootstrap-t Confidence Limits

Description:

See Efron and Tibshirani (1993) for details on this function.

Usage:

boott(x,theta, ..., sdfun=MISSING, nbootsd=25, nboott=200,
VS=FALSE, v.nbootg=100, v.nbootsd=25, v.nboott=200,
perc=c(.001,.01,.025,.05,.10,.50,.90,.95,.975,.99,.999))

Arguments:

x: a vector containing the data. Nonparametric bootstrap
sampling is used. To bootstrap from more complex data
structures (e.g. bivariate data) see the last example below.

theta: function to be bootstrapped. Takes 'x' as an argument, and
may take additional arguments (see below and last example).

...: any additional arguments to be passed to 'theta'

sdfun: optional name of function for computing standard deviation of
'theta' based on data 'x'. Should be of the form: 'sdmean <-
function(x,nbootsd,theta,...)' where 'nbootsd'  is a dummy
argument that is not used. If 'theta' is the mean, for
example,  'sdmean <- function(x,nbootsd,theta,...)
{sqrt(var(x)/length(x))}' . If 'sdfun' is missing, then
'boott' uses an inner bootstrap loop to estimate the
standard deviation of 'theta(x)'

nbootsd: The number of bootstrap samples used to estimate the standard
deviation of 'theta(x)'

nboott: The number of bootstrap samples used to estimate the
distribution of the bootstrap T statistic.  200 is a bare
minimum and 1000 or more is needed for  reliable  alpha %
confidence points, alpha > .95 say.  Total number of
bootstrap samples is  'nboott*nbootsd'.

VS: If 'TRUE', a variance stabilizing transformation is
estimated,  and the interval is constructed on the
transformed scale, and then is mapped back to the original
theta scale.  This can improve both the statistical
properties of the intervals and speed up the computation. See
the reference Tibshirani (1988) given below. If 'FALSE',
variance stabilization is not performed.

v.nbootg: The number of bootstrap samples used to estimate the variance
stabilizing transformation g.  Only used if 'VS=TRUE'.

v.nbootsd: The number of bootstrap samples used to estimate the
standard deviation of 'theta(x)'.  Only used if 'VS=TRUE'.

v.nboott: The number of bootstrap samples used to estimate the
distribution of  the bootstrap T statistic. Only used if
'VS=TRUE'. Total number of bootstrap samples is
'v.nbootg*v.nbootsd + v.nboott'.

perc: Confidence points desired.

Value:

list with the following components:

confpoints: Estimated confidence points

theta, g: 'theta' and 'g' are only returned if 'VS=TRUE' was specified.
'(theta[i],g[i]),  i=1,length(theta)'  represents the
estimate of the variance stabilizing transformation 'g' at
the points 'theta[i]'.

References:

Tibshirani, R. (1988) "Variance stabilization and the bootstrap".
Biometrika (1988) vol 75 no 3 pages 433-44.

Hall, P. (1988) Theoretical comparison of bootstrap confidence
intervals. Ann. Statisi. 16, 1-50.

Efron, B. and Tibshirani, R. (1993) An Introduction to the
Bootstrap. Chapman and Hall, New York, London.

Examples:

#  estimated confidence points for the mean
x <- rchisq(20,1)
theta <- function(x){mean(x)}
results <- boott(x,theta)
# estimated confidence points for the mean,
#  using variance-stabilization bootstrap-T method
results <-  boott(x,theta,VS=TRUE)
results$confpoints # gives confidence points # plot the estimated var stabilizing transformation plot(results$theta,results\$g)
# use standard formula for stand dev of mean
# rather than an inner bootstrap loop
sdmean <- function(x, ...)
{sqrt(var(x)/length(x))}
results <-  boott(x,theta,sdfun=sdmean)

# To bootstrap functions of more  complex data structures,
# write theta so that its argument x
#  is the set of observation numbers
#  and simply  pass as data to boot the vector 1,2,..n.
# For example, to bootstrap
# the correlation coefficient from a set of 15 data pairs:

xdata <- matrix(rnorm(30),ncol=2)
n <- 15
theta <- function(x, xdata){ cor(xdata[x,1],xdata[x,2]) }
results <- boott(1:n,theta, xdata)


Up: Lectures Previous: Confidence Intervals
Susan Holmes 2004-05-19