Stat 263/363: Experimental Design


For Stanford people: the canvas page will have the HWs and more references. Outline and syllabus

Overview

Correlation is not causality. You've probably heard that before in any number of regression classes. If you want to infer causality from data, then the best way is to use randomized experiments. Maybe it is the only way to be sure.

In experimental design we look at how to choose the data that we will gather. In addition to being able to make causal conclusions, we also look at how to maximize the statistical efficiency of the generated data set.

Experimental design as a subject is about 100 years old. The methods in this course date back to agricultural field trials. Since then the ideas have seen use in medicine, manufacturing, quality control, computer aided design and electronic commerce. Each new field takes the previous methods and then starts adapting them. Possibly the first clinical trial was that of James Lind in 1747 showing that citrus is effective against scurvy. (It was not immediately adopted.)

There will be some problem sets, a midterm on Thursday October 22 and a project. The project will involve designing, carrying out and analyzing a real experiment. This can be from your every day life: cooking, hobbies, exercise routines, etc. There are ordinarily about 4 problem sets. As I announced in class, due to the pandemic it seems better to have a larger number of smaller problem sets.

This Mark Rober video (might serve an ad) describes an experiment to study which animals (snake vs turtle vs tarantula) are more likely to be run over by vehicles. The results are interesting. It is also funny. (I don't recognize the characters that appear near the end though.)


Goals

  1. Learn the main/classical methods of experimental design so that when it comes time to gather data you can work out the right choice.
  2. Exposure to the research frontier in DoE: A/B testing, computer experiments, design for high dimensional regression.
  3. Do a designed statistical experiment from conception to execution to analysis.

Topics

see page 2 of the course announcement. I'm expecting and hoping for two guest lectures to displace two of the post-midterm topics.

Classes

Online Tue & Thu 2:30 to 3:50
Lectures at PhD level, homework at MS level.

Grading basis

3 units and letter grade or CR/NC.

Instructor

Art Owen
Sequoia Hall 130
My userid is owen at the address stanford.edu
Office hour: Weds 1:30 - 2:30

TAs


Texts

Some links below. More may be in canvas.

Evaluation

HW 50%. Midterm 25%. Final project 25%.

Supplementary readings

These articles should be readable for Stanford users.

More details

Be sure to give Axess a working email address:
I expect to send a small number of important emails about problem sets and the homework there. Most other announcements will be made in class. If you email me about the class, be sure to have stat 363 or stat 263 in your subject line. Otherwise, your email won't show when I search for course related emails.
Late penalties apply:
We will count days late on each problem set. Each day late is penalized by 10% of the homework value. Homework more than 3 days late will ordinarily get 0. Upload to gradescope within canvas. For sickness, interviews and other events, up to 3 late days (4 in pandemic years) total are forgiven at the end of the quarter. (Work late enough to get zero does not get redeemed though.)

Instructor scribed lecture notes

  1. Sep 15. Introduction History of design. Potential outcomes.
  2. Sep 17. A/B testing Applications to web companies.
  3. Sep 22. Bandits Especially Thompson sampling.
  4. Sep 24. Pairing and blocking Prior Stat 305A ANOVA notes One way analysis.
  5. Sep 29. ANOVA Prior MC notes ANOVA Includes functional ANOVA.
  6. Oct 01. \(2^k\) factorials Motivations and notation.
  7. Oct 03. \(2^{k-p}_R\) fractional factorials Aliasing and data analysis.
  8. Oct 08. ANCOVA and crossovers Before after comparisons.
  9. Oct 10. Split-plots and nesting Also cluster randomized trials.

Previous lecture notes

These are scribed lecture notes from a prior version of the course. I was at Grid Science 2017 for the first week. I'm grateful to the TAs Minyong Lee and David Walsh who gave the first two lectures.
  1. Jan 09. Introduction guest lecturer Minyong Lee, scribed by Zachary del Rosario.
  2. Jan 11. Guest lecture David Walsh A/B testing
  3. Jan 18. Neyman-Rubin scribed by Julie Zhu
  4. Jan 23. Bandits scribed by Zachary del Rosario
  5. Jan 25. Fixed vs random effects, blocks, Latin squares scribed by Alexa Haushalter
  6. Jan 30. Wrapup of blocking, and start of factorials scribed by Senem Onen
  7. Feb 01. Full factorial design scribed by Anvita Gupta
  8. Feb 06. 2^k factorials and blocking factorials scribed by Hailey Kwon
  9. Feb 08. Fractional factorials scribed by Poorna Kumar
  10. Feb 13. Analysis of covariance scribed by Akhil Prakash (response surfaces got postponed to do a pre-midterm review)
  11. Feb 15. We had a midterm, mostly about blocking and factorials.
  12. Feb 22. Response surface designs scribed by Alexa Haushalter
  13. Feb 27. Taguchi methods scribed by Zachary del Rosario
  14. Mar 01. Effects: nested and crossed, fixed and random and mixed.
  15. Mar 06. Optimal design, Hadamard, supersaturated, compressed sensing scribed by Hailey Kwon
  16. Mar 08. Designs for computer experiments scribed by Malo Marrec. See also Halton sequences and their scrambling by Okten, Shah and Goncharov (2012)
  17. Mar 13. Analysis of computer experiments scribed by Poorna Kumar. Based on Dice Kriging by Roustant, Ginsbourger and Deville (2012) and Owen and Koehler (1996)
  18. Mar 15. Some project presentations and a final summary.