Stats 300C
Theory of Statistics
Spring 2018

Emmanuel Candes
113 Sequoia Hall

Office Hours: M 2:00--3:00
or by appointment


Monday, Wednesday and Friday
10:30-11:20 a.m.
Sequoia Hall 200






Description: The main goal of this course is to expose students to modern ideas in statistical theory. Whereas classical theory is concerned with the behavior of statistical estimates when the number of variables is fixed and the sample size increases, our emphasis is on statistical inference in high-dimensional settings where there may be as many, or more, variables than observations. Our focus is motivated by always newer technologies, which now produce extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units.

Stats 300A and 300B. Knoweldge of probability theory at the level of Stats 310A and 310B.


  • Testing problems in high dimensions: sparse alternatives (needle in a haystack) and nonsparse alternatives, Bonferroni's method, Fisher's test, ANOVA, higher criticism.
  • Multiple testing problems: familywise error rate (FWER), procedures for controlling FWER, false discovery rate (FDR), procedures for controlling FDR, empirical Bayes view of FDR, local FDR.
  • Conditional testing, controlled variable selection, knockoffs.
  • Topics in selective inference: false coverage rate, post-selection inference, selection after the LASSO.
  • James-Stein estimation, Stein's unbiased risk estimate.
  • Model selection in high dimensions: thresholding rules, Cp/Akaike Information Criterion, Bayesian Information Criterion, Risk Inflation Criterion.
  • Oracles and oracle inequalities.
  • Computationally tractable methods for variable selection: the LASSO, the Dantzig selector.
  • Theory of high-dimensonal regression: approximate message passing algorithms.


We will not follow a textbook but the students might find the following references useful for background reading.

  1. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction by B. Efron, IMS Monographs.
  2. Gaussian estimation: Sequence and wavelet models by I. Johnstone and available here.

    The books below provide background for a few probabilistic results that we shall use.

  3. Large deviations techniques and applications, second edition by A. Dembo and O. Zeitouni, Springer, Application of Mathematics, vol. 38.
  4. Empirical Processes With Applications to Statistics by G Shorack and J Wellner, Classics in Applied Mathematics.
  5. Random Fields and Geometry by R. Adler and J. Taylor, Springer Monographs in Mathematics Springer, New York.

Handouts: I will post some lecture notes online, see the proper section.

Course assistant and office hours:

  • Michael Celentano () Office hours Tuesday 12:30-2:30 p.m., 105 Sequoia Hall.
  • Qian Zhao () Office hours Thursday 6:00-8:00 p.m., 105 Sequoia Hall.

Grading (tentative):

  • Homework assignments: 45%
    • Homework assignments will generally be distributed on Wednesdays and are due in class the following Wednesday.
    • Late homeworks will NOT be accepted for grading (medical emergencies excepted with proof).
    • There will be about 6 assignments; the lowest score will be dropped in the final grade.
    • It is encouraged to discuss the problem sets with others, but everyone needs to turn in a unique personal write-up.

  • Scribing of lectures: 5%
    • Most lectures are already scribed but we shall adjust some here and there.

  • Final project: 50%.
    • Most likely a take-home exam.

Course policies:

  • Use of sources (people, books, internet and so on) without citing them in homework sets results in failing grade for course.