We want to estimate and we can use as an estimate either or . In fact there is an intermediary choice, that takes the empirical cdf and smooths it a little, then we use the smoothed empirical cdf denoted by and we plug it in.

This is especially useful when the bootstrap distribution is too discrete, mostly when the statistic is a quantile, the median as we saw in the mouse data analysis had that problem.

The crosses, which are the conditional averages are a smooth of the scatter plot is some way.

Now suppose that the x's could be all over the place, we window them and take local averages.

The extreme case is when you take the whole x axes, then there is only one average, if you want you draw a line through it.

When the window is the smallest there is NO smoothing.

Again we want something gentler so we reduce the window width, and only take local averages. If we choose to differentiate within a window the points that are close to the abscisse at which we want to estimate the value by averaging, we can use a kernel weighting function.

Points that are close are given high weights, points further away are given lighter weights, on the boundary of the window the points won't count.

The weighting function is such that the sum of all the weights is 1. With no difference between weights, they are uniform. In fact the weighting function can be a probability density and often we take a Normal one.

Here is a nice webpage on smoothing, with available matlab softare.

loess.m is available in the course directory &
loess is a built-in function in Splus.

Matlab procedure for bootstrapping the loess curve.

#N is the number of bootstrap. N=500; predmat=zeros(N, 101); datasize=size(cholo,1); clf; plot(cholo(:,1), cholo(:,2), '.'); hold on; for i=1:N xind=unidrnd(datasize, datasize,1); x=cholo(xind,:); predmat(i,:)=loess(x(:,1), x(:,2), (0:100), .3, 1); plot((0:100), predmat(i,:), '-.'); #Plot a sample bootstrap curve. end; #Plot the 95\% pointwise confidence lines. plot((0:100), prctile(predmat, 2.5), 'r-'); plot((0:100), prctile(predmat, 97.5), 'r-'); xlabel('Compliance'); ylabel('Improvement'); axis([-5, 105, -40, 120]);

- Generate bootstrap samples
and the bootstrap estimates
.
- For each b, take bootstrap resamples and estimate the standard error.

- Fit a smooth curve to the pairs to produce a smooth estimate of the function, we will call it .
- Use as the variance stabilizing transformation. Find through numerical integration usually.
- Compute with bootstrap resamples, a bootstrap t interval for . (SE approximately one, so no denominator).
- Map back the endpoints of the interval through a transformation.

boott package:bootstrap R Documentation Bootstrap-t Confidence Limits Description: See Efron and Tibshirani (1993) for details on this function. Usage: boott(x,theta, ..., sdfun=MISSING, nbootsd=25, nboott=200, VS=FALSE, v.nbootg=100, v.nbootsd=25, v.nboott=200, perc=c(.001,.01,.025,.05,.10,.50,.90,.95,.975,.99,.999)) Arguments: x: a vector containing the data. Nonparametric bootstrap sampling is used. To bootstrap from more complex data structures (e.g. bivariate data) see the last example below. theta: function to be bootstrapped. Takes 'x' as an argument, and may take additional arguments (see below and last example). ...: any additional arguments to be passed to 'theta' sdfun: optional name of function for computing standard deviation of 'theta' based on data 'x'. Should be of the form: 'sdmean <- function(x,nbootsd,theta,...)' where 'nbootsd' is a dummy argument that is not used. If 'theta' is the mean, for example, 'sdmean <- function(x,nbootsd,theta,...) {sqrt(var(x)/length(x))}' . If 'sdfun' is missing, then 'boott' uses an inner bootstrap loop to estimate the standard deviation of 'theta(x)' nbootsd: The number of bootstrap samples used to estimate the standard deviation of 'theta(x)' nboott: The number of bootstrap samples used to estimate the distribution of the bootstrap T statistic. 200 is a bare minimum and 1000 or more is needed for reliable alpha % confidence points, alpha > .95 say. Total number of bootstrap samples is 'nboott*nbootsd'. VS: If 'TRUE', a variance stabilizing transformation is estimated, and the interval is constructed on the transformed scale, and then is mapped back to the original theta scale. This can improve both the statistical properties of the intervals and speed up the computation. See the reference Tibshirani (1988) given below. If 'FALSE', variance stabilization is not performed. v.nbootg: The number of bootstrap samples used to estimate the variance stabilizing transformation g. Only used if 'VS=TRUE'. v.nbootsd: The number of bootstrap samples used to estimate the standard deviation of 'theta(x)'. Only used if 'VS=TRUE'. v.nboott: The number of bootstrap samples used to estimate the distribution of the bootstrap T statistic. Only used if 'VS=TRUE'. Total number of bootstrap samples is 'v.nbootg*v.nbootsd + v.nboott'. perc: Confidence points desired. Value: list with the following components: confpoints: Estimated confidence points theta, g: 'theta' and 'g' are only returned if 'VS=TRUE' was specified. '(theta[i],g[i]), i=1,length(theta)' represents the estimate of the variance stabilizing transformation 'g' at the points 'theta[i]'. References: Tibshirani, R. (1988) "Variance stabilization and the bootstrap". Biometrika (1988) vol 75 no 3 pages 433-44. Hall, P. (1988) Theoretical comparison of bootstrap confidence intervals. Ann. Statisi. 16, 1-50. Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman and Hall, New York, London. Examples: # estimated confidence points for the mean x <- rchisq(20,1) theta <- function(x){mean(x)} results <- boott(x,theta) # estimated confidence points for the mean, # using variance-stabilization bootstrap-T method results <- boott(x,theta,VS=TRUE) results$confpoints # gives confidence points # plot the estimated var stabilizing transformation plot(results$theta,results$g) # use standard formula for stand dev of mean # rather than an inner bootstrap loop sdmean <- function(x, ...) {sqrt(var(x)/length(x))} results <- boott(x,theta,sdfun=sdmean) # To bootstrap functions of more complex data structures, # write theta so that its argument x # is the set of observation numbers # and simply pass as data to boot the vector 1,2,..n. # For example, to bootstrap # the correlation coefficient from a set of 15 data pairs: xdata <- matrix(rnorm(30),ncol=2) n <- 15 theta <- function(x, xdata){ cor(xdata[x,1],xdata[x,2]) } results <- boott(1:n,theta, xdata)