Description:
Introduction to fundamental ideas and techniques of
statistical modeling, with an emphasis on conceptual
understanding and on the analysis of real data
sets. Assignments will draw on data analysis problems in
various science and engineering fields, and will involve
some programming.
Prerequisite:
Ma 2 or other introductory course in probability and
statistics; working knowledge of linear algebra at the
level of Ma 2. Programming experience is particularly
useful.
Syllabus:
 Simple linear regression: least squares estimation,
analysis of residuals
 Multiple linear regression: parameter estimation,
inference about model parameters
 Analysis of variance, comparison of models, model
selection
 Assessing goodnessoffit, outliers, influential
observations
 Collinearity and rankdeficiency, singular value
decomposition, regularization
 Choosing models and fitting parameters:
crossvalidation, Lcurve
 Principal component analysis, linear discriminant
analysis
 Generalized linear models: models, estimation and examples
 Resampling methods and the bootstrap
Textbooks:(on reserve at
the library)
 Montgomery, D. C., Peck E. A., and
G. G. Vining, Introduction to Linear Regression
Analysis, 4th Edition,
Wiley (2006) [required]
 Manly, D. F. J, Multivariate
Statistical Methods: A Primer, 3rd Edition, Chapman
and Hall (2004)
Great reference for reviewing some elements of linear
algebra, and for linear discriminant analysis, principal
components analysis and canonical correlation analysis
 Efron, B. and R. J. Tibshirani, An
Introduction to the Bootstrap, Chapman and Hall
(1993)
Excellent introducuction to the bootstrap and its many
applications. Also provides fresh insights into many topics in
statistics
 Johnson, R. A., and D. W. Wichern, Applied
Multivariate Statistical Analysis, 5th Edition,
Prentice Hall (2002)
Covers many of the topics we will study in class. The book is more
theoretically oriented than our textbook, and should provide a
complement for students wishing to go deeper in the theory
 Venables, W. N., and
B. D. Ripley, Modern Applied
Statistics with S, 4th Edition, Springer (2002)
This reference explains how to use R (S is the same as R except for
very few commands). Also reviews a lot of statistical methodology and
introduces some nice data sets
Handouts:
All handouts will be stored in a binder in 217 Firestone and/or
posted online.
Teaching assistants and office hours:
 Kelly Littlepage: (klittlepage@caltech.edu): Tuesdays
1–2, 304 Firestone
 Peter Stobbe (stobbe@acm.caltech.edu): Wednesdays
10–11:30, SFL Study Group 2–3
 Yaniv Plan (plan@acm.caltech.edu): Mondays 2–3:30, 212
Firestone
Introduction to statistical
computing with R:
Wednesdays 5–6, Firestone 306
(except on 10/08 where it will be in Moore 070)
Grading:
 Homework assignments: 60%
 Homework assignments will generally be distributed on Thursdays
and are due in class the following Thursday.
 Late homeworks will NOT be accepted for grading
(medical emergencies excepted with proof).
 There will be about 6 or 7 assignments; the
lowest score will be dropped in the final grade.
 Final exam (takehome): 40%.
 Use of sources without citing them in homework sets or
in the final exam results in failing grade for course.
