.. stats202 documentation master file, based on matplotlib sampledoc Syllabus ======== ******************* Schedule & Location ******************* MWF 1:15-2:05, NVIDIA Auditorium **************** Instructor & TAs **************** Instructor ---------- `Jonathan Taylor `_ * Office: Sequoia Hall #137 * Phone: 723-9230, * `Email `_ * Office hours: F 10:30-12:30 Email list ---------- The course has an email list that reaches all TAs as well as the professor: `stats202-aut1213-staff@lists.stanford.edu `_. *All questions should be directed to this email list, rather than TA or the instructor.* Teaching assistants ------------------- * `Austen Head `_ * Office: Sequoia Hall #208 * `Email `_ * Office hours: M 2:30-4:30 * `Murat Erdogdu `_ * Office: Sequoia Hall #206 * `Email `_ * Office hours: TTh 9:45-10:45 * `Linxi Liu `_ * Office: Sequoia Hall #227 * `Email `_ * Office hours: MW 10:00-11:00 Piazza ------ We've setup a `piazza `_ website to help with common questions, etc. ******** Textbook ******** * `Introduction to Data Mining `_, Tan, Steinbach & Kumar. *Required* ******************* Optional references ******************* * `Introduction to Statistics with R `_, Dalgaard. * `Elements of Statistical Learning `_, Hastie , Tibshirani & Friedman (A more statistically advanced treatment of most of the topics.) * `Data analysis and graphics using R: an example-based approach `_, John H. Maindonald, John Braun. ***************** Computer examples ***************** We will be using `R `_ for most examples, to demonstrate some examples. You can read about it `here `_. Another language / environment is the `scipy `_ environment. I will be using R from the `ipython `_ notebook so as to record the sessions in class. To run the notebooks, one can use the `Enthought Python Distribution `_. The output will be posted on the webpage for use, even without running it in the notebook. ************* Prerequisites ************* Some familiarity with linear algebra and statistical methods. ********** Evaluation ********** * homework (6 total, best 5 count); 60% * midterm (W 10/31 in class); 15% * final exam (according to Stanford calendar: W 12/12 @ 8:30AM); 25% * `Kaggle competition `_ -- possibility for a 20% bonus on final exam Homework -------- * The worst of the 6 homework grades will be dropped. * The homework assignments and solutions will be posted on the coursework web page, though perhaps with some delay after being posted on this webpage. * Homework is to be submitted electronically as a PDF file in the dropbox on coursework `https://coursework.stanford.edu/portal/site/F12-STATS-202-01 `_. * Each homework should be a single PDF following the pattern: "*LastName_FirstInitial_HW1.pdf*", "*LastName_FirstInitial_HW2.pdf*". For example, my 4th homework would be called "*Taylor_J_HW4.pdf*". * Include your name in the PDF file. * Include copies of your code in the PDF file. * Only submit one PDF file for each homework. * You may discuss homework problems with other students, but you have to prepare the written assignments yourself. This means you are not to copy other students' code or other parts of their homework. * Homework will be timestamped based on when it is emailed to the staff mailing list. Each homework will have a specific deadline -- late homeworks will be penalized 10 points. * SCPD students should also CC their submissions to `scpd-distribution@lists.stanford.edu `_ .. assignments:: :assigned: 1, 2, 3, 4, 5, 6 :solved: Final exam ---------- * Following the Stanford `calendar `_: Wednesday, December 12 @ 8:30AM. * If you cannot take the exam at that time and day, then you will have to take this class in a different quarter. Exceptions will only be made due to official university affairs, such as athletic commitments. * Take a look at last year's final for `practice `_. This was assigned by Prof. Holmes, so will be slightly different than what we have covered. * Here is another `practice `_ which was the final exam last year. Midterm exam ------------ * Here is a `practice midterm exam `_ from last year. This was assigned by Prof. Holmes, so will be slightly different than what we have covered. * Here is a second `practice midterm `_ which was last year's midterm. ***** Notes ***** Week 1 ------ * `Introduction `_ * `Example: U.S. congress voting records `_ * `Data types `_ * `All slides from Week 1 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 voting.rst diamonds.rst unemployment.rst helpR.rst Week 2 ------ .. * Preprocessing * `Preprocessing `_ .. * Dimension reduction * `Dimension reduction `_ .. * Distances * `Distances `_ * `All slides from Week 2 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 discretization.rst timeseries.rst olympic.rst distances.rst Week 3 ------ * `Graphics in R `_ * `Multidimensional scaling `_ * `Multidimensional arrays `_ * `All slides from Week 3 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 visualization.rst mds.rst arrays.rst Week 4 ------ * `Decision trees `_ * `All slides from Week 4 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 trees.rst Week 5 ------ * `Discriminant analysis `_ * `Midterm review `_ * `All slides from Week 5 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 lda.rst Week 6 ------ * `Rule based, nearest neighbour, naive Bayes `_ * `All slides from Week 6 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 classifiers.rst Week 7 ------ * `Ensemble methods `_ .. * Graph data (not covered) .. * `Graph data `_ .. * Correlation .. * `Correlation `_ .. * Networks .. * `Networks `_ * `All slides from Week 7 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 svms.rst ensemble.rst Week 8 ------ * `Clustering in general `_ * `K-means, K-medoid `_ * `All slides from Week 8 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 kmeans.rst Week 9 ------ * `Hierarchical clustering `_ * `Model based clustering `_ * `Outlier detection `_ * `All slides from Week 9 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 hierarchical.rst model_cluster.rst Week 10 ------- * `LASSO `_ * `Post midterm review `_ * `All slides from Week 10 `_, `(2x2 version) `_ Examples ~~~~~~~~ .. toctree:: :maxdepth: 1 lasso.rst *********** R resources *********** * `An Introduction to R `_ * `R for Beginners `_ * `Using R for Introductory Statistics `_ * `Modern Applied Statistics with S `_ * `Practical ANOVA and Regression in R `_ * `simpleR `_ * `Introduction to R `_ * `R Reference Card `_ * `R Manuals `_ * `R Wiki `_