Up: Introduction to the Bootstrap
Previous: Introduction to the Bootstrap
MWT: Sequoia 200 11-12.
There will be three homeworks to hand in 20 %
There will be three labs to hand in 20 %
and two bigger projects (midterm and final) 60 %
The main component of the projects will
be matlab, or R/Splus
perform certain analyses and produce graphics,
these functions should be emailed to me
and hardcopies sent to the TA's.
The term project should allow you to see the bootstrap
applied to your
field of interest, which also means that you can do
a theoretical study if that is your primary interest.
- First Part due Monday, May, 3rd to be handed in in class.
- Completed project: due Friday, June 4, 2004 at 12:00 pm
Typically the project can be one of the following types, or a combination
of elements from each.
You are encouraged to talk to the instructor or teaching assistants
about ideas for a project before you decide on the subject, I will
put an appointment calendar on my door, you should take a time
to come and talk with me about your project.
- A case study, if you have some original data, and a statistical
problem that you want to solve with the bootstrap.
- A comparative study, you would like to compare performance of
the bootstrap with other methods in different situations.
- Implementation of a new computational procedure,for instance,
you could try and write a gray code program with a clever update
for other statistics than the correlation coefficient, or use a fancy
variance-reducing technique for the Monte Carlo step, or
improve on the empirical distribution as the estimator
- A theoretical study on how fast the bootstrap
estimate converges, and how to improve it.
Some projects will involve considerably more effort than others, and thus
have greater potential to earn an outstanding grade. While
the complete project
counts for 60% of your final grade, the biggest payoff of a more
challenging project is in the opportunity it provides you to solidify
and extend your understanding of the material in the course and to
obtain practical experience in applying it to your own research
concerns. The first part will count 20%, the second, 40%.
You may want to do a project using data you have from another course
(whether from an experiment or through access to a data set somebody
else has collected). This may be a good way to apply statistics to
something you have thought about. If you do a project of this sort,
you must make very clear which part of the project is done specifically
for your statistics course and which part is just a review or copy of
work you have done for another course.
The first part of the project should be about 5 pages long
(without counting the bibliography that should be very complete).
Length is not an asset if it is not associated
with increased content!
- A simple and clear exposition of the question you are
- A situation of the problem in the wider context of contemporary
statistics, with a review of available methods for such data
and a few words on their advantages and disadvantages.
- A proposed solution to the problem using either the bootstrap
or another resampling procedure, with comparative merits of the
as proposed to other methods.
- A flow chart of the various tasks to be undertaken,
programming, testing the program on simulated data,
testing the program on real data are all reasonable steps.
Computer output should be incorporated in the usual way, i.e. put
tables in the text or at the end but do not hand in a pile of unedited
computer output. Tables and figures should be numbered and
captioned. No uncommented output will be considered.
The quality of presentation will come into account
for the final grade. (Incorporation of good quality
graphics, careful text-processing, no superfluous output).
You should put the text of your computer programs in an appendix.
- A theoretical part: explanation of the
studied, its properties.
- A computational part: an algorithm for
implementing the method in matlab or S-plus,
this should also be emailed to the TAs so it can be tested.
Make sure your code is readable, so we can eventually do a little
trouble shooting if necessary.
- A data-analysis part: actual data are to be
submitted to the method studied, or tables should show comparisons,
or theoretical results should be outlined.
Analysis of a data set with your algorithm: Perform a
statistical analysis of some data set from an experiment, survey, or
secondary data source using Matlab Splus. You should pay critical attention to issues
concerning how the data were collected as well as to the statistical
analysis. (Depending on the nature of the data and your own
relationship to it, you may want to give more or less emphasis to
explanation of the data collection.) You should make sure that your
data set has enough complexity (more than just a couple of variables, and
a decent number of observations) to support an interesting analysis.
Some ideas according to your area of expertise:
You should consult some of the bootstrap books I have put on reserve
at the maths and computer science library.
- Education, Psychology, Social Scientists:
Methods such as regression analysis,
multivariate analyses, clustering can be bootstrapped usefully.
- Biology : Analysis of DNA : distances, phylogenies
are bootstrapped alot.
- Econometry : Time Series Data need special treatment because
of the underlying dependence.
Some books in other areas that include data sets are the following:
- Human Development Report, published annually for the United
Nations Development Program. There are a number of other statistical
reports from the UN and other international agencies like the International
- Statistical Abstract of the United States. Full of all sorts
of statistical tables.
- On the Net(see below), for instance the `Chance' project
of Laurie Snell is very interesting.
Also, articles in books and journals sometimes contain the original
data set and you may have an idea for a different analysis than the one
which the author did. You should distinguish carefully between what you did and
what was in the original article.
There is an increasing
amount of data available on the Internet. As with other Internet
materials, there is some gold out there and a lot of pure junk. If you
would like to browse around for data on a topic you are interested in,
you can start from the Statistics Department home page
http://www-stat.stanford.edu/links, and look under
Journal, books, etc...
There are special bases on the test for each area :
Genbank for genetical data for instance, there are also sports
There are many journals which include articles
with statistical analyses at an accessible level; in some cases the
original data sets are also included.
Psychology, biology and medicine are areas in which many
articles will include at least some statistics. Talk to instructors in
your field about what journals make use of statistical methods.
- Data: a Collection of Problems from Many Fields for the
Student and Research Worker, by D.F. Andrews and A.M. Herzberg
- Case Studies in Biometry, edited by Lange, Ryan, Billard,
Brillinger, Conquest and Greenhouse
- Population Studies
- Chance (a popularly-oriented statistics magazine)
- Ecology (particularly Volume 74, No. 6, a special issue on
- Journal of Experimental Zoology
- New England Journal of Medicine
- Public Opinion Quarterly
- Journal of Applied Psychology
- Proceedings of National Academy of Sciences,section Evolution
Instructor: Susan Holmes.
Office hours: Wed at 2.30 and
by email appointment to email@example.com.
TA's Brit Katzen and Jie-Hua Chen
TA's office hours:
Brit Katzen (Sequoia 229) : Wed. 2:15 - 3:45
Jie-Hua (Sequoia 141) :Thur 4.-5
This will contain a bulletin board,
homeworks, course summary,
project description list,
links to useful sites with in particular Splus and matlab tutorials,
software information, etc...
Weekly consultation of the web site will be necessary
and expected of all students.
Up: Introduction to the Bootstrap
Previous: Introduction to the Bootstrap