Examples :
Suppose we are interested in the estimation of an unknown
parameter
(this doesn't mean that we are in a parametric context).
The two main questions asked at that stage are :
The second question has to be answered through information on the distribution or at least the variance of the estimator.
Of course there are answers in very simple contexts:
for instance when the parameter of interest is
the mean then the estimator
has a known standard
deviation : the estimated standard error
noted sometimes
In maximum likelihood theory the question
1 is answered through using the mle and then the
question 2 can be answered with an approximate
standard error of
The bootstrap is a more general way to answer question 2 , with the following aspects:
If we had several samples from the unknown
(true) distribution
then we could consider the variations
of the estimator :
If we are interested in the behaviour of a random variable , then we can consider the sequence of new values obtained through computation of new bootstrap samples.
Practically speaking this will need generatation of an integer between 1 and n, each of these integers having the same probability through a function denoted by RANINT(1,n) hereof.
Here is an example of a line of matlab that does just that:
T= floor((rand(1)*n)+1);
If we use S we won't need to generate the new observations one by one, the following command generates a n-vector with replacement in the vector of indices (1...n).
sample(n,n,replace=T)
An approximation of the distribution of the
estimate
is provided by the distribution
of
An example data set is LS (or laws) : the law
school data consisting in 15 bivariate data points,
each corresponding to an American law school
as analysed in EFRON 1982:
In order to repeat a large number of times the same instructions needed to construct data sets one usually has to use a looping facility, actually matlab and S are easier to program in this circumstance because one can use ordinary loops and use owner-created functions, as well as trying to benefit form the vectorized commands.
laws<-scan() 576 3.39 635 3.30 558 2.81 578 3.03 666 3.44 580 3.07 555 3.00 661 3.43 651 3.36 605 3.13 653 3.12 575 2.74 545 2.76 572 2.88 594 2.96 "boot.simple"<- function(data, fun, nboot, ...) { n <- nrow(as.matrix(data)) res.boot <- vector("list", length = nboot) indb <- matrix(sample(n, size = n * nboot, T), nrow = nboot) for(b in (1:nboot)) { res <- fun(indb[b, ], ...) plot(res, xlab = "", ylab = "", lab = c(0, 0, 0)) res.boot[[b]] <- res } return(res.boot) } "boot"<- function(ind, nboot, fun, ...) { # Tibshirani's efficient Bootstrap # If fun is function of a matrix or returns multiple values # it has to be transformed before data <- matrix(sample(ind, size = length(ind) * nboot, replace = T), nrow = nboot) return(apply(data, 1, fun, ...)) } #Elementary example of using boot corr<- function(x, xdata) { cor(xdata[x, 1], xdata[x, 2]) } j1000 <- boot(1:15, 1000, theta, laws) hist(j1000 - 0.777) q.bt <- quantile(junk1000, probs = c(0.01, 0.99)) #With greek label hist(junk1000 - 0.777, nclass = 40, xlab = "q* - q", font = 13) #Example of Bootstrapping a Mann-Whitney statistic "mw"<- function(x, y) { #Function that computes a Mann-Whitney statistic for two samples n <- length(x) m <- length(y) rks <- rank(c(x, y)) wx <- sum(rks[1:n]) ux <- n * m + (m * (m + 1))/2 - wx mwx <- 1 - ux/(m * n) return(mwx) } "mwbt"<- function(x, y, nboot = 100, fun = mw, ...) { bts <- rep(0, nboot) for(b in (1:nboot)) { xbts <- sample(x, length(x), replace = T) ybts <- sample(y, length(y), replace = T) bts[b] <- fun(xbts, ybts, ...) } return(bts) } "mwxy"<- function(xy, n) { # n <- length(x) m <- length(xy) - n rks <- rank(xy) wx <- sum(rks[1:n]) ux <- n * m + (m * (m + 1))/2 - wx mwx <- 1 - ux/(m * n) return(mwx) }
#Parametric Bootstrap "boot.par"<- function(model.sim = law.sim, n = 15, nboot = 66) { #Parametric bootstrap of correlation coefficient #Law sim contains bivariate normal data theta <- rep(0, nboot) for(b in (1:nboot)) theta[b] <- corr(((b - 1) * n + 1):(b * n), law.sim) return(theta) } #Example of using the functions and plotting the results n1 <- rnorm(1000) n2 <- rnorm(1000) law.sim <- cbind(n1, 0.777 * n1 + sqrt(1 - (0.777^2)) * n2) rbootpar <- boot.par() hist(rbootpar) par(new=T) plot(density(rbootpar, width = 0.15), xlab = "", ylab = "", lab = c(0, 0, 0),type = "l") title("Parametric Bootstrap - Law School corr.", cex = 1.9)
Instead of resampling directly from the empirical distribution , one smoothes it out first then the smoothed version is used to generate the new samples.
If a kernel smoother is used, for instance a Gaussian kernel, then generation from the smoothed distribution is easy since it is sufficient to resample with replacement and then perturb the sampled point by adding a small gaussian random variable (SILVERMAN B.W. & YOUNG G.A.)
#Smoothed Bootstrap "boots"_function(x,nboot=100,fun,kernel="norm",h=1) { #Function doing a smoothed bootstrap with either # a normal or an epaneichikov kernel #h is the window width (can be tricky to choose) n_nrow(as.matrix(x)) theta_rep(0,nboot) for (k in (1:nboot)) { if(kernel=="norm") eps_rnorm(n,mean=0,sd=1) else eps_epan(n) xb_x[sample(n,n,T)]+eps*h theta[k]_fun(xb) } return(theta) } "epan"_function(n) { eps_rep(0,n) v_matrix(runif(3*n,-1,1),nrow=n) for(i in(1:n)) { if ((abs(v[i,3])>=abs(v[i,2])) && (abs(v[i,3])>=abs(v[i,1]))) eps[i]_v[i,2] else eps[i]_v[i,3] } return(eps) }
The percentile method consists
in taking the
confidence interval
for as being
Preliminaries : Pivotal quantities
Construction of confidence intervals
is based (ideally) on a pivotal quantity
, that is a function of the sample
and the parameter whose distribution
is independant of the parameter
, the sample, or any other unknown
parameter.
Thus for instance no knowledge
is needed about the parameters and
of a normal variable
to construct
confidence intervals for
both of these parameters,
because one has two pivotal quantities available :
Let
denote the largest
quantile of the distribution of the
pivot , and
the parameter space, set of all possible values for
then
Such an ideal situation is of course rare and one has to do with approximate pivots, (sometimes called roots). Therefore errors will be made on the level () of the confidence sets. Level error is sensitive to how far is from being a pivot.
We will present several approaches for finding roots One approach is to studentize the root that is divide it by an estimate of its standard deviation. Another is to use a cdf transform constructed from a first set of bootstrap simulations as explained in section 2.3.
Suppose that we are interested in a parameter , and that is an estimate based on the n-sample .
We will suppose that the asymptotic variance of is with a corresponding estimate .
The studentized bootstrap or bootstrap-t method (EFRON 1982, HALL 1988) is based on approximating the distribution function of by , distribution of .
Denote by
the th quantile
of the distribution of
,
"bootsd" <- function(x, theta, nbootsd = 25) { n <- length(x) boo <- boot(x, fun = theta, nboot = nbootsd) return(sqrt(((n - 1)/n) * var(boo))) } "bootst" <- function(x, fun = median, nboot = 100, nbootsd = 25) { # Studentized Bootstrap boo <- rep(0, nboot) for(b in (1:nboot)) { boo[b] <- bootsd(x, theta = fun, nbootsd = nbootsd) } thetab <- boot(x, fun = fun, nboo = nboot) theta0 <- fun(x) thetasd <- ((thetab - theta0)/boo) return(thetasd) }
However experiences have shown that preliminary transformation of such as Fishers transform (=arctanh ), followed by the Bootstrap-t and then retransformation of the confidence interval back to the original scale leads to much more dependable confidence intervals. Unfortunately such a handy transform is not always available but R. TIBSHIRANI, 1988 has suggested an algorithm for creating automatically defined variance stabilizing transformations.
Based on the inverse CDF property :
i.e. that if is distributed according to then is uniform.
Denote by the c.d.f. of our root which is supposed to converge weakly to the c.d.f. as increases.
As we have seen the idea behind the
bootstrap is to replace the unknown
distribution function by its empirical
counterpart , thus using
as
an estimate for the unknown
cumulative distribution.
Once validation of the procedure has been insured through
verifying that
The algorithm proposed by the authors was
however rather expensive timewise
( ).
We have a more economical one implemented
proposed by Gleason .
:
Let and be two different estimates of an unknown parameter , if and can be chosen so as to be negatively correlated, then has half the variance of and .
The bootstrap version of this is based on an antithetic permutation of the sample. B ordinary resampling operations then supplies the equivalent of 2B resampling operations.
The software output from script in the appendix shows :
The mean for population A is : 10.375
The mean for population B is : 12.208
The difference is : 1.833
As the variances can be considered as equal,
a classical test under the normal assumption gives
a t-value of -3.1561 indicating a significant difference
because the P-value is 0.0046.
Assumptions :
A bootstrap approach Following the previous definitions of the bootstrap a possible bootstrap test could run as follows:
However it is important to note that there is no assumption on the fact that the observations form a random sample from some population. There is no inference that can be carried automatically over to a hypothetic population.
On the contrary the assumption underlying the bootstrap is that the observations form a random sample whose distribution approximates that of the population. Thus the conclusions drawn from a bootstrap test can be used to infer on the population.
NOREEN 1989 suggests using a combination of these two tests when inference is needed for testing whether two variables are stochastically independant in the population. However one must keep in mind that at the same time one tests whether the marginal distributions of the variables in the sample satisfactorially estimate those of the variables in the population.
For B large enough, provides the probability that is in the bootstrap confidence interval.
In order for the experience to be really useful, one ought to retry with different sample sizes and different expectations, one could also retry with all sorts of different distributions.
It is by writing this that you can build the correct intervals.
Now I am going to compare an answer to an estimation probalem both through maximum likelihood and through bootstrapping. First let us remind ourselves of notations:
have joint density
denoted
Given observed values
, the liklihood
of is the function
If the distribution is discrete, will be the frequency distribution function.
In words:
Definition:
The maximum likelihood estimate (mle)
of is that value of that maximises
: it is the value that makes
the observed data the ``most probable''.
If the are iid, then the likelihood simplifies to
Rather than maximising this product which can be
quite tedious, we often use the fact that
the logarithm is an increasing function so it
will be equivalent to maximise the log likelihood:
are counts in cells/ boxes 1 up to m, each box has a different probability (think of the boxes being bigger or smaller) and we fix the number of balls that fall to be : . The probability of each box is , with also a constraint: , this is a case in which the are NOT independent, the joint probability of a vector is called the multinomial ,and has the form:
Each box taken separately against all the other boxes is a binomial, this is an extension thereof.
We study the log-likelihood of this :
We use
This is a trinomial with three boxes:
the probabilities are parametrized by:
Each of the counts is binomially distributed with probabilities as described above so that:
We can either use asymptotic maximum likelihood or the bootstrap, we are going to compare the two using Splus.
After some algebra we find:
Of course as usual we don't know we have to plug in the estimate:
The bootstrap procedure we are going to follow generates multinomials with the estimated probabilites and then computes for these new counts the bootstrap estimate for which we then compute the standard deviation.
Here is the Splus code I used:
multin <- function(pvec, num) { #Function that generates the output of #putting num balls into length(pvec) boxes, #with probabilities each given by the vector pvec run <- runif(num) gener <- cut(run, cumsum(c(0, pvec))) return(gener) } > theta<- function(xvec) { return((2 * xvec[3] + xvec[2])/(2 * sum(xvec))) } bts.ml.mw<- function(thetahat = 0.4247, nboot = 1000) { res.bt <- rep(0, nboot) pv <- c((1 - thetahat)^2, 2 * thetahat * (1 - thetahat), thetahat^2) for(b in (1:nboot)) { xvec <- multin(pv, 1029) res.bt[b] <- theta(xvec) } return(res.bt) } bt1 <- bts.ml.mw() quantile(bt1, c(0.025, 0.975)) 2.5% 97.5% 0.403292 0.4446186 quantile(bt1 - 0.4247, c(0.025, 0.975)) 2.5% 97.5% -0.02140797 0.01991856 > sqrt( var(bt1)) [1] 0.01074981
EVAL Execute string with MATLAB expression. EVAL(s), where s is a string, causes MATLAB to execute the string as an expression or statement. EVAL(s1,s2) provides the ability to catch errors. It executes string s1 and returns if the operation was successful. If the operation generates an error, string s2 is evaluated before returning. Think of this as EVAL('try','catch'). The error string produced by the failed 'try' can be obtained with LASTERR. [X,Y,Z,...] = EVAL(s) returns output arguments from the expression in string s. The input strings to EVAL are often created by concatenating substrings and variables inside square brackets. For example: Generate a sequence of matrices named M1 through M12: for n = 1:12 eval(['M' num2str(n) ' = magic(n)']) end Run a selected M-file script. The strings making up the rows of matrix D must all have the same length. D = ['odedemo ' 'quaddemo' 'fitdemo ']; n = input('Select a demo number: '); eval(D(n,:)) See also FEVAL, EVALIN, ASSIGNIN, LASTERR.
%BOOTSTRP Bootstrap statistics. % BOOTSTRP(NBOOT,BOOTFUN,D1,...) draws NBOOT bootstrap data samples and % analyzes them using the function, BOOTFUN. NBOOT must be a % positive integer. % BOOTSTRAP passes the (data) D1, D2, etc. to BOOTFUN. % % [BOOTSTAT,BOOTSAM] = BOOTSTRP(...) Each row of BOOTSTAT contains % the results of BOOTFUN on one bootstrap sample. If BOOTFUN returns a matrix % then this output is converted to a long vector for storage in BOOTSTAT. % BOOTSAM is a matrix of indices into the row % Initialize matrix to identify scalar arguments to bootfun. scalard = zeros(nargin-2,1); lp = '('; % Left parenthesis string variable. rp = ')'; % Right parenthesis string variable. c = ','; % Comma string variable. ds = 'd'; % 'd' as a string variable. % Construct argument list for bootfun for k = 3:nargin dk = [ds,num2str(k-2)]; [row,col] = size(eval(dk)); if max(row,col) == 1 scalard(k-2) = 1; end if row == 1 & scalard(k-2) == 0 eval([dk,'=',dk,'(:);']); row = col; end if k == 3 pstring = [lp,dk]; n = row; if nargin == 3 pstring = [pstring,rp]; end elseif k == nargin & nargin > 3 pstring = [pstring,c,dk,rp]; else pstring = [pstring,c,dk]; end end % Create index matrix of bootstrap samples. bootsam = unidrnd(n,n,nboot); % Get result of bootfun on actual data and find its size. thetafit = eval([bootfun,pstring]); [ntheta ptheta] = size(thetafit); % Initialize a matrix to contain the results of all the %bootstrap calculations. bootstat = zeros(nboot,ntheta*ptheta); dbs = 'db'; % Do bootfun - nboot times. for bootiter = 1:nboot for k = 3:nargin dk = [ds,num2str(k-2)]; dbk = [dbs,num2str(k-2)]; if scalard(k-2) == 0 eval([dbk,'=',dk,'(bootsam(:,',num2str(bootiter),'),:);']); else eval([dbk,'=',dk,';']); end if k == 3 pstring = [lp,dbk]; n = row; if nargin == 3 pstring = [pstring,rp]; end elseif k == nargin & nargin > 3 pstring = [pstring,c,dbk,rp]; else pstring = [pstring,c,dbk]; end end evalstr = [bootfun,pstring]; tmp = eval(evalstr); bootstat(bootiter,:) = (tmp(:))'; end
Suppose we wanted to estimate the overall risk of a CART tree through the bootstrap:
We will call the rule built from
the sample , is the response
of size
and the explanatory variable, we call the
resubstitution estimate
The true risk for is:
The difference
Now we are going to show that under certian conditions the bootstrap is going to behave badly.
Suppose that this data is not well adapted to a prediction of by and that the variables are independent.
Say we are trying to classify the observations, that is that the repsonse variable is a categorical variable, with say, clases, the prior probabilities for each class are unknown so taken to be Every classification rule will in fact have risk .
Suppose we construct a tree that goes all the way down to
individual leaves, that is the rule will give exact
classification for an observation it has already seen
and will classify correctly an observation it
is seeing for the first time with probability .
Thus
and
,
the expected value of the bias
computed for the empirical distribution
is
( any member
of the original sample does not appear at all in the sample), thus
Several papers by Freedman and Peters, (Jasa 1984, Journ Bus. Econ. Stat. 1985) document the fact that the bootstrap does not give the correct answer for multivariate regression situations where the number of variables is of a similar order as the number of observations.
These two examples seem to suggest that much care must be used in applying the bootstrap in nonparametric situations!
zea_scan() 1: 188 96 168 176 153 172 177 163 146 173 186 168 177 184 96 16: 139 163 160 160 147 149 149 122 132 144 130 144 102 124 144 zea_matrix(zea,ncol=2) zea [,1] [,2] [1,] 188 139 [2,] 96 163 [3,] 168 160 [4,] 176 160 [5,] 153 147 [6,] 172 149 [7,] 177 149 [8,] 163 122 [9,] 146 132 [10,] 173 144 [11,] 186 130 [12,] 168 144 [13,] 177 102 [14,] 184 124 [15,] 96 144 diff_zea[,2]-zea[,1] > diff [1] -49 67 -8 -16 -6 -23 -28 -41 -14 -29 -56 -24 -75 -60 48 > rank(abs(diff)) [1] 11 14 2 4 1 5 7 9 3 8 12 6 15 13 10 sign(diff) [1] -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 > sign(diff)*rank(abs(diff)) [1] -11 14 -2 -4 -1 -5 -7 -9 -3 -8 -12 -6 -15 -13 10 W_sum(wvec[wvec>0]) > W [1] 24 args(wilcox.test) function(x, y = NULL, alternative = "two.sided", mu = 0, paired =F, exact =T,correct=T) sum(wvec[wvec<0]) [1] -96 > wilcox.test(zea[,1],zea[,2],paired=T) Exact Wilcoxon signed-rank test data: zea[, 1] and zea[, 2] signed-rank statistic V = 96, n = 15, p-value = 0.0413 alternative hypothesis: true mu is not equal to 0 > wilcox.test(zea[,1],zea[,2],paired=T,exact=T,alternative="greater") Exact Wilcoxon signed-rank test data: zea[, 1] and zea[, 2] signed-rank statistic V = 96, n = 15, p-value = 0.0206 alternative hypothesis: true mu is greater than 0 > 15*16*31/24 [1] 310 > sqrt(310) [1] 17.60682 (96-120)/17.6 [1] -1.363636
2. The original Gray code, Fisher s test
A. The original Gray code.
Let be the set of binary -tuples. This may be identified with
the vertices of the usual -cube or with the set of all subsets of an
element set. The original Gray code gives an ordered list of with
the property that successive values differ only in a single place. For
example, when such a list is
Gray codes were invented by F. Gray (1939) for sending sequences of bits using a frequency transmitting device. If the ordinary integer indexing of the bit sequence is used then a small change in reception, between 15 and 16, for instance, has a large impact on the bit string understood. Gray codes enable a coding that minimizes the effect of such an error. A careful description and literature review can be found in Wilf (1989). One crucial feature: there are non-recursive algorithms for providing the successor to a vector in the sequence in a simple way. This is implemented through keeping track of the divisibility by 2 and of the step number.
One way to express this is as follows : let be the binary representation of the integer , and let be the string of rank in the Gray code list. Then and . For example, when , the list above shows the string of rank is ; now . So . Thus from a given string in the Gray code and the rank one can compute the successor. There is a parsimonious implementation of this in the algorithm given in the appendix. Proofs of these results can be found in Wilf (1989).
We study two uses for such Gray codes here: they provide a way of complete enumeration for Fisher's randomization procedure and for Hartigan's subset procedure. Similar procedures can be used for signed rank statistics and the usual two sample ( out of ) tests.
B. Fisher's randomization test.
Fisher (1935) gave a permutation justification for the usual test for paired observations . In his example (Darwin's Zea data) and were real numbers representing plant height for treated and untreated plants. Darwin had calculated the mean difference. Fisher gave a way of calibrating this by calculating and considering all possible sums with .
We observe that a Gray code allows us to systematically run through all patterns. One need only update the present value of the sum by changing a single value. Furthermore the symmetry of the problem saves a factor 2 of the computations. Figure 2.1 shows a histogram of the randomization distribution based on this procedure applied to the Zea data. The original value of the sum is 39.25 inches. This clearly falls in the tail of the distribution. In fact the exact proportion of sign patterns with a larger absolute sum is (as Fisher also calculated). Figure 2.1 Randomization Distribution of for Zea Data
For the remainder of this section we will use the exact distribution to compare two methods of approximation for the significance level of Fisher's randomization test.
Given let . Let and let be the number of choices of sign values such that is strictly larger than . For Fisher's example giving a p-value of .
The first approximation to uses the fact that is
a sum of independant (non identically distributed) random variables.
Thus with
,
Figure 2.2a shows a plot of versus x for the Zea data. The Normal approximation is quite good. It gives a p-value of 0.0538.
Fisher himself suggested a second approximation :
using the studentized statistic
This is a one to one increasing function of , if . So the number of sign patterns such that is strictly larger than the observed equals . Thus Fisher's t-approximation is based on :
Remarks : The Gray code can be useful with any statistic and savings increase if efficient updating for single changes are available.
Fisher's test and this data set have been extensively discussed in the statistics literature. Fisher himself avoided complete computation by looking at the tails and coming in carefully. Usual approximations proceed by Monte Carlo, however Pagano and Tritchler (1983) give a fast Fourier algorithm for complete enumeration. We give a different Fourier implementation and further discussion in Section 2D. Lehmann (1975) and Tritchler (1984) discuss exact confidence intervals derived from Fisher's randomization test. Manly (1991) carries out the exact computation without using an updating procedure. Basu (1980) and his discussants debate the logic of such ``permutation'' tests. Maritz (1981) gives a clear presentation of more general two-sample location problems. The Gray code approach can be used for . It also adapts to the vector valued statistic studied by Eaton and Efron (1970).
Recently the permutation test based on thsi statistic has been renamed `Mantel's test' by the ecological community (Manly 1992).
The statistic we look at is Which gives an indication of their correlation.
In fact we remark that if each distance matrix were vectorized, this is the equivalent of an uncentered covariance between the vectors.
If the two distances are small in the same places, and big in the same places the statistic will be large indicating a strong link between the distances.
In order to actually test the significance of such a link,
a permutation test is performed:
the null distribution
is obtained by permuting the elements
of to obtain what is denoted by
, keeping the matrix symmetric, and
recomputing
the statistic:
The Splus progam enabling the implementation of this algorithm can be found below.
What will be useful to us here is the idea of a vectorial covariance between distance matrices, which we will apply to the special case of correlation matrices which are in fact similarities.
We define as in (Escoufier, 1973), the vector
covariance between two matrices as their natural
inner product:
daniel<- function(mata, matb, nperm) { if(ncol(mata) != ncol(matb)) stop("matrices should have same dimension") if(nrow(mata) != nrow(matb)) stop("matrices should have same dimension") r0 <- stat(mata, matb) r <- matrix(0, nperm, 1) for(i in (1:nperm)) { matp <- permute(mata) r[i] <- stat(matp, matb) } p <- as.numeric(r < r0)/(nperm + 1) return(p, r0, r) } > permute function(mata) { ap <- matrix(0, nrow(mata), nrow(mata)) ni <- sample(nrow(mata)) for(i in 1:nrow(mata)) { for(j in 1:(i - 1)) { ap[i, j] <- mata[ni[i], ni[j]] } } return(ap) } > stat<- function(mata, matb) { r0 <- 0 for(i in 2:nrow(mata)) for(j in (1:(i - 1))) { r0 <- r0 + mata[i, j] * matb[i, j] } return(r0) }
More precisely, suppose we have two sample of observations, we will call them the 's and the 's.
Null hypothesis .
Test statistic is the number of runs , that is the
number of consecutive sequences of identical labels.
The null distribution of the test statistic
can be found by a combinatorial argument:
The main reference is Friedman and Rafsky 1979, Ann stat.,pp. 697-717.
One may either use multidimensional data or first make the data bidimensionnal by principal components. (or another dimension reducing technique), in order to be able to represent the tree in a graphic.
Algorithm:
The output to create will be a vector of numbers that correspond to where the th point should be connected to.
Here are some examples with Splus:
mstree(vincr[1:10, ]) > mstree(vincr[1:10, ]) $x: [1] 4.20959759 -3.70445704 4.54166603 7.22903347 4.89603710 2.28360796 [7] 0.00000000 -2.00589728 1.03156424 0.04926162 $y: [1] -7.0463328 -6.4673190 0.4050449 -0.3888339 -2.2247181 0.0000000 [7] 0.0000000 -3.2911484 -0.8079203 -1.5704904 $mst: [1] 10 8 6 3 3 7 10 10 10 mst: vector of length nrow(x)-1 describing the edges in the minimal spanning tree. The ith value in this vector is an observation number, indicating that this observation and the ith observation should be linked in the minimal span- ning tree. $order: [,1] [,2] [1,] 4 7 [2,] 3 10 [3,] 5 6 [4,] 6 9 [5,] 7 8 [6,] 10 3 [7,] 1 1 [8,] 9 4 [9,] 8 5 [10,] 2 2 order: matrix, of size nrow(x) by 2, giving two types of ord- ering: The first column presents the standard ordering from one extreme of the minimal spanning tree to the oth- er. This starts on one end of a diameter and numbers the points in a certain order so that points close together in Euclidean space tend to be close in the sequence. The second column presents the radial ordering, based on dis- tance from the center of the minimal spanning tree. These can be used to detect clustering. See below for graph theory definitions.
plot(x, y) # plot original data mst <- mstree(cbind(x, y), plane=F) # minimal spanning tree # show tree on plot segments(x[seq(mst)], y[seq(mst)], x[mst], y[mst]) i <- rbind(iris[,,1], iris[,,2], iris[,,3]) tree <- mstree(i) # multivariate planing plot(tree, type="n") # plot data in plane text(tree, label=c(rep(1, 50), rep(2, 50), rep(3, 50))) # identify points # get the absolute value stress e distp <- dist(i) dist2 <- dist(cbind(tree$x, tree$y)) e <- sum(abs(distp - dist2))/sum(distp) 0.1464094
This extends to the case of more than two levels of treatment by doing the following:
Example: Wine data As you may remember, this data was 14 physico-chemical composition varaibles (all continuous), that we are trying to relate to several categorical variables.
acp.vin <- acp.us(vin[, 1:14], q.choix = 2) #Two components was the choice here plot(acp.vin$ind.cords, type = "n") text(acp.vin$ind.cords[as.numeric(vin[, 15]) == 1, ], "1") text(acp.vin$ind.cords[as.numeric(vin[, 15]) == 2, ], "2") text(acp.vin$ind.cords[as.numeric(vin[, 15]) == 3, ], "3") mst.wine <- mstree(acp.vin$ind.cords, plane = F) [1] 10 66 6 28 8 4 6 27 7 9 1 14 14 11 14 44 15 25 45 41 39 50 16 76 78 [26] 37 58 5 27 31 62 2 63 31 55 69 40 45 38 18 21 52 51 47 37 26 42 49 76 46 [51] 75 22 54 55 34 53 34 57 33 61 35 33 65 60 78 59 64 77 40 73 72 74 71 43 48 [76] 46 75 segments(acp.vin$ind.cords[seq(mst.wine), 1], acp.vin$ind.cords[seq(mst.wine), 2], acp.vin$ind.cords[mst.wine, 1], acp.vin$ind.cords[mst.wine, 2]) title("Wines Minimal Spanning Tree") > sum(vin[mst.wine,15]==vin[-78,15]) [1] 44
randclass_function(S=1000,data=as.numeric(vin[-78,15]), compar=as.numeric(vin[mst.wine,15])){ same_rep(0,S) n_length(compar) for (i in (1:S)) same[i]_sum(as.numeric(data==compar[sample(n,n)])) return(same) } r1_randclass() > max(r1) [1] 39 > max(r2) [1] 40 hist(r1,nclass=50)
Matlab MSTREE algorithm:
this doesn't work but should be
made to as part of the mstree project.
function out=mst(distm) %Computes the minimum spanning tree %form a matrix of distances between n objects % n=length(distm); out=[-n*ones(1,n-1) 0]; distree=distm(n,:); n1=(1:(n-1)); [a b vectd]=find(tril(distm)); [a b vecti]=find(tril(ones(n,1)* (1:n),-1)); for k =(1:(n-1)) %Find the nearest edge to the tree nottree=find(out<0); [distmin imin]=min(vectd(out<0)); %Find the index of the minimum imin=n1(distm==distmin); imin=imin(1); %Adjoin a new edge out(imin)=-out(imin); %Update list of nearest vertices for i = (1:(n-1)) if (out(i)<0 & distm(i,imin)<distm(i,-out(i))) out(i)=-imin; end; end end
This means that when we have a closed form formula for the inverse of the cdf which is not too expensive, we can use a random uniform and do the transformation that makes it have the correct density.
Example 1: Exponential
Example 2: Correlated Random Variables
and have maximum
positive correlation
and have the distributions and ,
for negative
correlation just take .
Example 3: Maxima
If closed forms are not available, but a good numerical solver is, then you can use that.
We will choose to generate from, so it must be easier than to generate from. For instance, for bounded we can take the uniform on .
Algorithm:
This is why it works:
We need more than one uniform to generate one random variable with the right distribution.
How many are needed? This will depend on .
The number of necessary trials before accepting will be a geometric with .
Example 1:
Suppose is a bounded density on a compact support,
take ,
Repeat the folowing:
This is usually very inefficient, it is much better to take a bounding function that `fits'.
Example 2:
Simulating a normal using an exponential as a bounding function:
we use the fact that the absolute value of a normal has density:
Algorithm:
Modified to short-cut cosines and sines:
Generate a point uniformly in the disk(1): .
Call ,
Return ,
Exercice for next time (Tuesday April 22nd):
With matlab, do the flop count for both methods: