Books



An Introduction to Statistical Learning: (Springer Series in Statistics)
by G. James, D. Witten, T. Hastie and R. Tibshirani


The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics)
by T. Hastie, R. Tibshirani, J. H. Friedman


Generalized Additive Models
by Trevor J. Hastie, Rob J. Tibshirani


An Introduction to the Bootstrap
by Bradley Efron and Robert Tibshirani


The Science of Bradley Efron
Carl Morris and Robert Tibshirani (editors)


The research reported here was partially supported by grants from the National Science Foundation and the National Institutes of Health.

Papers, Technical Reports & Talks

For up-to date papers in journals linked to PubMed, see:

2014


Jonathan Taylor, Richard Lockhart, Ryan Tibshirani and Robert Tibshirani. Post-selection adaptive inference for Least Angle Regression and the Lasso

2013


Robert Tibshirani. In praise of sparsity and convexity To appear, 50th Anniversity volume for COPSS


Max Grazier G'Sell, Stefan Wager, Alexandra Chouldechova, Robert Tibshirani. False Discovery Rate Control for Sequential Selection Procedures, with Application to the Lasso (submitted)

Richard Lockhart, Jonathan Taylor, Ryan Tibshirani and Robert Tibshirani. A significance test for the lasso (to appear, Annals of Statistics)
Talk slides, PIMS-UBC Constance Van Eeden lecture April 2014
Talk slides, ENAR presidential invited address, March 2014
Talk slides, CMU, Oct. 21, 2013
Talk slides, SSC Gold medal address, May 2013
1st Movie for talk- null model
2nd Movie for talk- 2 signal variables
R package (beta version!)

"Big data: how to avoid a big mess"; Talk slides for presentation at Big data in biomedicine conference, Stanford, May 2013.

2012


Jacob Bien, Noah Simon and Robert Tibshirani. A lasso for hierarchical testing of interactions


Lu Tian, Ash Alizadeh, Andrew Gentles and Robert Tibshirani. A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates (submitted)


Noah Simon and Robert Tibshirani. A Permutation Approach to Testing Interactions in Many Dimensions (submitted)


Jacob Bien, Jonathan Taylor, and Robert Tibshirani A lasso for hierarchial interactions .
Ann. Statist. Volume 41, Number 3 (2013), 1111-1141.
Talk slides


The lasso: some novel algorithms and applications (Talk slides; ASA chapter 2012)

2011


Noah Simon and Robert Tibshirani. Comment on "Detecting Novel Associations in Large Data Sets" by Reshef et. al (Science Dec. 2011)

Noah Simon and Robert Tibshirani. Standardization and the Group Lasso Penalty Statistica Sinica 22 (2012), 983-1001

Noah Simon and Robert Tibshirani. Regularization Paths for Cox's Proportional Hazards Model Journal of Statistical Software (2011)

Jacob Bien and Robert Tibshirani Sparse Estimation of a Covariance Matrix. Biometrika. 98(4). 807-820

Jacob Bien and Robert Tibshirani Hierarchical Clustering with Prototypes via Minimax Linkage Journal of the American Statistical Association. 106(495). 1075-1084

Jacob Bien and Robert Tibshirani Prototype Selection for Interpretable Classification Accepted for publication in Annals of Applied Statistics.

Jun Li and Robert Tibshirani. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical research.

Tibshirani, R. Regression shrinkage and selection via the lasso: a retrospective. JRSSB retrospective read paper, vol 73, part 3, page 273-282.


The lasso: some novel algorithms and applications (Talk slides; Purdue feb 2011)

2010

Tibshirani, Bien, Friedman, Hastie, Simon, Taylor and Tibshirani: Strong Rules for Discarding Predictors in Lasso-type Problems (Revised; To appear, JRSSB)
talk slides;
R scripts for papers

Jerome Friedman, Trevor Hastie and Robert Tibshirani: Applications of the lasso and grouped lasso to the estimation of sparse graphical models

Jerome Friedman, Trevor Hastie and Robert Tibshirani: A note on the group lasso and a sparse group lasso

Witten DM and R Tibshirani (2010) Supervised multidimensional scaling for visualization, classification, and bipartite ranking. Computational Statistics and Data Analysis: To Appear.

Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Journal of the American Statistical Association 105(490): 713-726.

Witten DM and R Tibshirani (2010). Survival analysis with high-dimensional covariates. Statistical Methods in Medical Research 19(1): 29-51.

G. I. Allen and R. Tibshirani, "Inference with Transposable Data: Modeling the Effects of Row and Column Correlations", (Submitted), 2010. [pdf]

G. I. Allen and R. Tibshirani, "Transposable regularized covariance models with an application to missing data imputation", Annals of Applied Statistics, 4:2, 764-790, 2010. [pdf]


2009


Lu Tian and Robert Tibshirani (2009). Adaptive index models for marker-based risk stratification. Biostatistics, July 27, 1010.


Robert Tibshirani (2009). Univariate shrinkage in the Cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology, vol 8.

Witten DM and R Tibshirani (2009). Extensions of sparse canonical correlation analysis, with applications to genomic data. SAGMB 2009. Winner of 2009 WNAR Student Paper Competition.

Witten DM, Hastie, T. and R Tibshirani (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3): 515-534. [r library]

Witten DM and R Tibshirani (2009). Covariance-regularized regression and classification for high-dimensional problems. Journal of the Royal Statistical Society, Series B 71(3): 615-636.
Note that this version of the manuscript contains a clarification to Section 3.2: if scout is performed using an alternative covariance estimator, then that estimator should be positive definite. [r library]

2008


Ryan Tibshirani and Rob Tibshirani (2008). A Bias Correction for the Minimum Error Rate in Cross-validation. Published in Annals of Applied Statistics


Witten DM and R Tibshirani and T Hastie (2009). A penalized matrix decomposition, with applications to sparse canonical correlation analysis and principal components Biostatistics 10,3,p515-534

Jerome Friedman, Trevor Hastie and Robert Tibshirani: Regularized Paths for Generalized Linear Models via Coordinate Descent. We use coordinate descent to develop regularization paths for linear, logistic and multinomial regression models. Our algorithms use the lasso and "elastic net" penalties of Zou and Hastie (2005), and create the path for a grid of values of the penalty parameter lambda. The glmnet package for fitting Lasso and elastic net models can be found on CRAN . Here is a MATLAB version .

Witten DM and R Tibshirani (2008)Testing significance of features by lassoed principal components. Annals of Applied Statistics 2(3): 986-1012. [r library]

2007


Daniela Witten and Robert Tibshirani
A comparison of fold change and the t-statistic for microarray data analysis


Bradley Efron, Trevor Hastie, Robert Tibshirani, Discussion: The Dantzig selector: Statistical estimation when p is much larger than n, The Annals of Statistics. Volume 35, Number 6 (2007), 2358-2364.


Jerome Friedman, Trevor Hastie and Robert Tibshirani Sparse inverse covariance estimation with the graphical lasso. Biostatistics, December 12, 2007.
We develop an efficient algorithm for solving the L1-penalized likelihood approach to sparse covariance estimation.

Hui Zou, Trevor Hastie and Robert Tibshirani
On degrees of freedom of the lasso (Annals of Statistics Volume 35, Number 5 (2007), 2173-2192. )
A technical paper that establishes that the number of non-zero coefficients in a lasso model is unbiassed for the effective degrees of freedom

Gen Nowak and Robert Tibshirani Complementary hierarchical clustering. Biostatistics, December 12, 2007.

Hoefling, H., and Tibshirani R.
A study of Pre-validation. Published In Annals of Applied Statistics.

Hoefling, H., Getz, G, and Tibshirani R.
Comment on "Significance of candidate cancer genes as assessed by the CaMP score" by Parmigiani et al. (pdf file)


Friedman, J., Hastie T, Tibshirani R.
Pathwise coordinate optimization (ps file)
(pdf) The Annals of Applied Statistics. Volume 1, Number 2 (2007), 302-332
We show how coordinate descent algorithms can efficiently solve a number of popular regularized optimization problems, creating an entire path of solutions. We generalize this approach to derive an efficient algorithm for the fused lasso, both one- and two-dimensional. Annals of Applied Statistics (2007), 1(2), 302-332.


Rob Tibshirani and Pei Wang
Spatial smoothing and hot spot detection for CGH data using the Fused Lasso (pdf file)

To appear, Biostatistics.


Guo Y, Hastie T, Tibshirani R.
Regularized linear discriminant analysis and its application in microarrays. Biostatistics. 2007 Jan;8(1):86-100. Epub 2006 Apr 7.

A method, similar to shrunken centroids, for classification and discrimination of microarrays, using regularized discriminant analysis with gene selection.

2006



Bradley Efron and Rob Tibshirani Tech report. August 2006
On testing the significance of sets of genes (ps file)
(pdf)
Published in Annals of Applied Statistics

Hui Zou, Trevor Hastie, and Rob Tibshirani. Sparse Principal Component Analysis. (pdf)
We present a new approach to principal component analysis, that allows us to use an L1 penalty to ensure sparseness of the loadings. Published in JCGS 2006 15(2): 262-286. Software is available in R package elasticnet available from CRAN. Rob Tibshirani, Larry Wasserman Tech report. July 2006

Correlation-sharing for detection of differential gene expression (ps file)
(pdf)

Debashis Paul, Eric Bair, Trevor Hastie, Rob Tibshirani. Tech report. April 2006
``Pre-conditioning'' for feature selection and regression in high-dimensional problems (revised dec 2006) (ps file)
(pdf)
To appear, Annals of Statistics.

Trevor Hastie, Jon Taylor, Rob Tibshirani, Guenther Walther. Tech report. March 2006
Forward Stagewise Regression and the Monotone Lasso (ps file)
(pdf)
To appear, Electronic Journal of Statistics.

Rob Tibshirani and Trevor Hastie. Tech report. Feb. 2006.
Margin trees for high-dimensional classification (ps file) (pdf file)
R linux package
R Windows package
Published in J. Mach. Learn. Res.
A tree-structured representation for a multiclass SVM classifier.

Rob Tibshirani and Trevor Hastie. Tech report. Jan. 2006. published in Biostatistics May 2006.
Outlier sums for differential gene expression analysis (published version- pdf)
(technical report- ps)

Biostatistics January 2007; 8: 2 - 8.

2005



Rob Tibshirani. Tech report. October 2005.
A simple method for assessing sample sizes in microarray experiments (ps file)
(pdf file)
BMC Bioinformatics 2006, 7:106

Jon Taylor and Rob Tibshirani. Tech report. July 2005.
Biostatistics April 2006; 7: 167 - 181.
A tail strength measure for assessing the overall significance in a dataset (ps file)
(pdf file)
Published in Biostatistics.



Hugh Chipman and Rob Tibshirani. Tech report. 2005.
"Hybrid Hierarchical Clustering with Applications to Microarray Data" (pdf file)
Biostatistics, Nov. 21, 2005.
R package


Eric Bair, Trevor Hastie, Debashis Paul, Rob Tibshirani. Tech report. 2004.
Prediction by supervised principal components. (ps file)
( pdf file)
Published in J. Amer. Statist. Assoc. (2006), vol. 101, p 119-
IMS 2007 medallion lecture slides (pdf)


Superpc R package

Jon Taylor, Rob Tibshirani and Brad Efron. Tech report. June 2004.
The ``Miss rate'' for the analysis of gene expression data. ( pdf file)
Published in Biostatistics 2005 6(1):111-117.

Trevor Hastie and Robert Tibshirani. Efficient quadratic regularization for expression arrays (Published in Biostatistics)

I. S. Lossos,Debra Czerwinski, Ash Alizadeh, Mark Wechser, Rob Tibshirani, David Botstein and Ronald Levy.
N. Engl. J. Med. 2004;350,1828-37
Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six Genes. (paper- pdf file)
(commentary- pdf file)

2003

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Scott Soltys, Albert Koong, Quynh-Thu Le. Sample classification from protein mass spectroscopy, by ``peak probability contrasts'' (ps file)
(pdf file) Published in Bioinformatics. talk slides (pdf file)

Robert Tibshirani, Michael Saunders, Saharon Rosset, and Ji Zhu. Sparsity and smoothness via the fused lasso (ps file)
(pdf file) Published in J. Royal. Statist. Soc. B.

Eric Bair and Robert Tibshirani. Semi-supervised methods to predict patient survival from gene expression data. (pdf file)
Published in PLOS biology.

John Storey and Robert Tibshirani. Statistical significance for genome-wide experiments. (pdf file)
Published in PNAS.

Robert Tibshirani and Eric Bair. Improved detection of differential gene expression through the singular value decomposition (ps file)
(pdf file)

2002

Robert Tibshirani and Bradley Efron. Pre-validation and inference in microarrays. Published in Statistical Applications in Genetics and Molecular Biology. Vol 1, No. 1, 2002.

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Class prediction by nearest shrunken centroids, with applications to DNA microarrays (pdf file) . This is a more statistical version of the PNAS paper below. Published in Statistical Science, 2003.

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. "Diagnosis of multiple cancer types by shrunken centroids of gene expression" (PNAS website). PNAS 2002 99:6567-6572 (May 14).

Brad Efron, Trevor Hastie, Iain Johnstone, Rob Tibshirani. Least angle regression (ps file).
pdf version Tech. report, Mar 2002. My co-authors LARS software for Splus, and a Splus helpfile. The software computes the entire LAR, Lasso or Stagewise path in the same order of computations as a single least-squares fit. The Lasso page

2001

Therese Sorlie, Perou, C., Robert Tibshirani, Turid Aas, Stephanie Geisler, Hilde Johnsenb, Trevor Hastie, Michael B. Eisenh, Matt van de Rijn, Stefanie S. Jeffrey, Thor Thorsen, Hanne Quist, John C. Matese, Patrick O. Brown, David Botstein, Per Eystein Lonninngg, and Anne-Lise Borresen-Daleb. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS 98: 10869-10874.
"pdf version".

Bradley Efron and Robert Tibshirani
"Microarrays, Empirical Bayes Methods, and False Discovery Rates" (ps file)

"pdf version".
Published in Genet Epidemiol. 2002 Jun;23(1):70-86

Robert Tibshirani, Guenther Walther, David Botstein and Pat Brown
"Cluster validation by prediction strength" (ps file)

"pdf version".
Published in Journal of Computational & Graphical Statistics, Volume 14, Number 3, September 2005, pp. 511-528(18)

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Michael Eisen, Gavin Sherlock, Pat Brown, and David Botstein
"Exploratory screening of genes and clusters from microarray experiments" (ps file)

"pdf version".


2000

Virginia Tusher, Robert Tibshirani and Gilbert Chu
"Significance analysis of microarrays applied to the ionizing radiation response." (ps file)

"pdf version".
PNAS 98: 5116-5124 (April 24) "Raw data"


"Statistical challenges in the analysis of DNA microarray data" (ps file)

"pdf version".
Lecture delivered to National Academy of Sciences, November 2000.

Bradley Efron, John Storey, Robert Tibshirani, Virginia Tusher.
"Empirical Bayes Analysis of a Microarray Experiment" (ps file) .

"pdf version".
Tech report, October 2000. "Raw data"

Trevor Hastie, Robert Tibshirani, David Botstein and Pat Brown,
"Supervised Harvesting of Expression Trees" (ps file) .

"pdf version".
Tech. report. August 2000. Published in Genome Biology, 2001 (www.genomebiology.com)

Donald Redelmeier and Robert Tibshirani, "Are those other drivers really going faster?" . Chance 2000;13:8-14.

Robert Tibshirani, Guenther Walther and Trevor Hastie.
"Estimating the number of clusters in a dataset via the Gap statistic" (ps file).

"pdf version".
Tech. report. March 2000. Published in JRSSB 2000.

Alizadeh, A. and 23 others. "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling" Nature 403, 503-511 (2000)

Trevor Hastie, Robert Tibshirani, Michael Eisen, Pat Brown, Doug Ross, Uwe Scherf, John Weinstein, Ash Alizadeh, Louis Staudt, David Botstein
"Gene Shaving: a New Class of Clustering Methods for Expression Arrays".
Tech. report. Jan 2000.

1999

Tibshirani, R., Hastie, T. Eisen, M., Ross, D. , Botstein, D. and Brown, P. "Clustering methods for the analysis of DNA microarray data". (compressed postscript 4.8mb) Tech. report Oct. 1999,

Donald Redelmeier and Robert Tibshirani, "Why cars in the next lane seem to go faster" . Nature 1999;401:35-36.

Tibshirani, R "Learning from Data: Statistical Advances and Challenges Keynote address: Splus Users conference, Oct 21, 1999

Tibshirani, R., and Lazzeroni, L. and Hastie, T. and Olshen, A. and Cox, D.R. "A global pairwise approach to radiation hydrid mapping". Tech. report Jan. 1999,. Using data of co-occurrence of hybridized markers after shattering, inference is made of the marker sequence in the chromosome.

1998

Friedman, J., Hastie, T. and Tibshirani, R "Additive Logistic Regression: a Statistical View of Boosting". Tech. report July 23, 1998 . We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far. Tibshirani, R "Learning from Data: Statistical Advances and Challenges". Plenary lecture, Center for automated learning and discovery, CMU, June 13, 1998.

Hastie, T. and Tibshirani, R "Bayesian backfitting" , "Abstract", March, 1998. The Gibbs sampler looks and feels like the backfitting algorithm for fitting additive models. Indeed, a simple modification to backfitting turns it into a Gibbs sampler for spitting out samples from the "posterior" distribution for an additive fit.

1997

Tibshirani, R. and Knight, K. "The covariance inflation criterion for model selection" (Talk Slides)" , January, 1998.

Tibshirani, R. and Knight, K. "The covariance inflation criterion for model selection" , November, 1997.

Tibshirani, R. "Some thoughts from half a career in statistics" , June 1997.

Rao, J.S.. and Tibshirani, R. "The out-of-bootstrap method for model averaging and selection" Technical Report, May 1997.

Ennis, Hinton, Naylor, Revow and Tibshirani "A comparison of statistical learning methods on the GUSTO database"

Efron, B. and Tibshirani, R. "The Problem of Regions" Technical Report, Feb 1997.

1996

Hastie, T. and Tibshirani, R. "Classification by pairwise coupling" Technical Report, Nov. 1996. We solve a multiclass classification problem by combining all the pairwise rules. This paper builds on ideas proposed by J. Friedman.

Tibshirani, R. Who is the fastest man in the world?". Tech report, Sept. 1996. Revised January, 1997. To appear, American Statistician.

Tibshirani, R. "Two applications of the bootstrap". Gordon Ashton Memorial Lecture, Guelph, Sep 17 1996.

Redelmeier, D. and Tibshirani, R. "Cellular telephones and automobile collisions: some variations on matched case-control analysis". Technical Report (Jan 1997).

Hastie, T., Ikeda, D. and Tibshirani, R. "Computer-aided diagnosis of mammographic masses" Technical Report (June 1996). This is a big file (5mb) because of the images. Here is a much smaller version without the images. It has been accepted for publication in J. Comp. and Graph. Statistics.

Tibshirani, R. "Bias, variance and prediction error for classification rules" Technical Report (April 1996), revised November 1996.

1995

Tibshirani, R. and Knight, K. "Model search and inference by bootstrap ``bumping'' " Technical Report (Nov. 1995). It has been accepted for publication in J. Comp. and Graph. Statistics.

Efron, B. and Tibshirani, R. "Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule (text only)" Technical Report (May. 1995).

1994

Hastie, T. J. and Tibshirani, R. "Discriminant Adaptive Nearest Neighbor Classification." Technical Report (Dec. 1994).

Tibshirani, R. "A comparison of some error estimates for neural network models" Technical Report, to appear Neural Computation 1995.

Tibshirani, R., and Hinton, G.E. "Coaching variables for regression and classification" " Technical Report (July 1994).

Tibshirani, R. "Regression selection and shrinkage via the lasso" Technical Report (June. 1994).

Hastie, T. J. and Tibshirani, R. "Discriminant Analysis by Gaussian Mixtures." To appear in JRSSB (Dec 1994). For longer technical report of Feb 1994 click here.

Tibshirani, R. "A proposal for variable selection in the Cox model" Technical Report (June. 1994).

Hastie, T. J., Buja, A., and Tibshirani, R. B "Penalized Discriminant Analysis." Technical Report (May 1994).

Hastie, T.J and Tibshirani, R. "Handwritten Digit Recognition via Deformable Prototypes." AT&T Bell Laboratories Technical Report, 1994.

1993

Hastie, T. J., Tibshirani, R. and Buja, A. "Flexible Discriminant Analysis by Optimal Scoring." Technical Report (Dec. 1993), appeared in JASA, December 1994. AT&T Bell Labs Technical Report (Dec. 1993).

1992

Tibshirani, R. "Principal curves revisited." Appeared in Stat and computing 1992.

1991

1990

1990

1989

1988

1987

1986

Hastie, Trevor, and Tibshirani, R. "Generalized Additive Models (with discussion)" Statistical Science Vol 1, No 3, pages 297-318

1985

1984

Tibshirani, R. My PhD thesis on Local likelhood estimation.

Software

bump

S routine for bumping. Shar file. (Rob Tibshirani, tibs@utstat.toronto.edu).

fda

S routines for Flexible Discriminant Analysis. These tools are enhancements on the discr and gdiscr functions, and allow linear, polynomial, and nonparametric versions of discriminant analysis. There are easy to use predict methods, as well as a postscript version of a paper describing the techniques. Shar file. (Trevor Hastie and Rob Tibshirani, trevor@playfair.stanford.edu).

gamfit

FORTRAN program for fitting generalized additive models. (Trevor Hastie and Rob Tibshirani, trevor@playfair.stanford.edu).

lasso
S functions for the lasso. (Rob Tibshirani, tibs@utstat.toronto.edu).

avas
S functions for avas,(additivity and variance stabilization) (Rob Tibshirani, tibs@utstat.toronto.edu).

bootstrap
S functions for bootstrap, jacknife and cross-validation (Rob Tibshirani, tibs@utstat.toronto.edu).

varcoef
S functions for varying coefficient models (Rafal Kustra rafal@utstat.toronto.edu and Rob Tibshirani, tibs@utstat.toronto.edu).