If we are interested in the behaviour of a random variable , then we can consider the sequence of new values obtained through computation of new bootstrap samples.
Practically speaking this will need generatation of an integer between 1 and n, each of these integers having the same probability.
Here is an example of a line of matlab that does just that: indices=randint(1,n,n)+1; Or if you have the statistics toolbox, you can use: indices=unidrnd(n,1,n);
If we use S we won't need to generate the new observations one by one, the following command generates a n-vector with replacement in the vector of indices (1...n).
An approximation of the distribution of the
is provided by the distribution
If we were given true samples, and their associated
we could compute the usual variance estimate
for this sample of values, namely:
treat=[94 38 23 197 99 16 141]' treat = 94 38 23 197 99 16 141 >> median(treat) ans = 94 >> mean(treat) ans = 86.8571 >> var(treat) ans = 4.4578e+03 >> var(treat)/7 ans = 636.8299 >> sqrt(637) ans = 25.2389 thetab=zeros(1,1000); for (b =(1:1000)) thetab(b)=median(bsample(treat)); end hist(thetab) >> sqrt(var(thetab)) ans = 37.7768 >> mean(thetab) ans = 80.5110This is what the histogram looks like:
control=[52 104 146 10 51 30 40 27 46]'; >> median(control) ans = 46 >> mean(control) ans = 56.2222 >> var(control) ans = 1.8042e+03 >> var(control)/length(control) ans = 200.4660 >> sqrt(200.4660) ans = 14.1586 thetab=zeros(1,1000); for (b =(1:1000)) thetab(b)=median(bsample(control)); end hist(thetab) >> sqrt(var(thetab)) ans = 11.9218 >> mean(thetab) ans = 45.4370This is what the histogram looks like:
Comparing the two medians, we could use the estimates of the standard errors to find out if the difference between the two medians is significant?
Suppose we condition on the sample of distinct observations , there are as many different samples as there are ways of choosing objects out of a set of possible contenders, repetitions being allowed.
At this point it is interesting to introduce a new
notation for a bootstrap resample,
up to now we have noted a possible
because of the exchangeability/symmetry property
we can recode this as the vector counting
the number of occurrences of each of the observations.
in this recoding we have
and the set of all bootstrap resamples
is the dimensional simplex
here is the function file approxcom.m
function out=approxcom(n) out=round((pi*n)^(-.5)*2^(2*n-1));that produces the following table of the number of resamples:
Are all these samples equally likely, thinking about the probability of drawing the sample of all 's by choosing the index times in the integer uniform generation should persuade you that this sample appears only once in times. Whereas the sample with once and all the other observations can appear in out of the ways.