next up previous
Next: Bootstrapping a Principal Component Up: Lectures Previous: The jackknife

Subsections

Cross Validation

Separate Diagnostic Data Sets When having iterated through several exploratory methods, varied the projections and looked for the best `fit', there seems only one honest method of verifying whether one is overfitting noise or whether there really is a latent variable or a good prediction available and that is to have another set of data of exactly the same type.

The best thing to do is at the beginning of the study to take a random sub-sample, without any particular stratification and to put it aside for the confirmatory stage. Many scientists are mean with their data, and only have just enough to model, but nowadays the expense of an extra 25 % or so, should be made - especially when the consequences of the study are medical, this is what tukey and mallows call a careful serarate diagnostic.

Cross-Validation when there is a response variable

When the above prescription is not followed and one of the variables has the status of variable to be explained, it is possible - at computational expense, but who cares ? - to redo the analysis leaving out part of the data and comparing with the reference set.

For instance in Discriminant Analysis
For each observation, do the analysis without that one, and look whether or not it is well classified, this will give an unbiased estimate of the percentage of badly classified. Cross Validation can thus be used when one variable has the particular status of being explained.

And in regresssion
We want to estimate the prediction error:

\begin{displaymath}PE=E_F(y-\hat{y})^2\end{displaymath}

This can be done by cross validation, writing:

\begin{displaymath}PRESS=\frac{1}{n}\sum_{i=1}^n
(\hat{y}_{(i)}-y)^2\end{displaymath}

However it has also been used at the diagnostic stage in principal components, and in classification and regression trees where it helps choose the size of an `optimal tree'.


next up previous
Next: Bootstrapping a Principal Component Up: Lectures Previous: The jackknife
Susan Holmes 2004-05-19