Statistics 202: Statistical Aspects of Data Mining (Fall 2005)

Instructor: Jerome H. Friedman

Place / Time: Gates B1 / MW 2:45 - 4:00pm.


Data Mining is used to discover patterns and relationships in data, with an emphasis on large observational data bases. It sits at the common frontiers of several fields including Data Base Management, Statistics, Artificial Intelligence, Machine Learning, Pattern Recognition, and Data Visualization. From a statistical perspective it can be viewed as computer automated analysis and exploration of (usually) large complex data sets. Data Mining is having a major impact in business, industry, and science. This course covers some of the principal methods used for Data Mining, with the goal of placing them in common perspective and providing a unifying overview.


What is DM? Myths: what it can and can't do. Description vs. prediction. Knowledge discovery "process".
Overview :
What is data: types of measurements. What are "patterns" in data? Statistical inference. Description vs. prediction. Types of data. Types of procedures.

Prerequisites: A familiarity with the basic concepts in probability, claculus, linear algerbra, and optimization. Statistics116 useful (not required).