Statistical Learning with Sparsity: The Lasso and Generalizations (2015) (Chapman & Hall/CRC Monographs on Statistics & Applied Probability) by T. Hastie, R. Tibshirani and M. Wainwright Book Homepage |
An Introduction to Statistical Learning: (2013) (Springer Series in Statistics) by G. James, D. Witten, T. Hastie and R. Tibshirani Book Homepage pdf (9.4Mb, 6th corrected printing) |
The Science of Bradley Efron (2008) Carl Morris and Robert Tibshirani (editors) |
The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics) (2001 & 2009) by T. Hastie, R. Tibshirani, J. H. Friedman. Book Homepage Springer page |
Generalized Additive Models (1993) by Trevor J. Hastie, Rob J. Tibshirani |
An Introduction to the Bootstrap (1990) by Bradley Efron and Robert Tibshirani |

Only a selection of my methodology papers are listed below. See : Full list of my Papers on arXiv

Animation explaining the idea

Robert Tibshirani. Keynote lecture, SDSS 2020

Robert Tibshirani. Keynote lecture, Columbia Data Science in Health Summit

Kenneth Tay and Jerry Friedman and Robert Tibshirani Principal component-guided sparse regression.

My short convocation speech, accepting my U Waterloo honorary degree

Leying Guan and Robert Tibshirani Post model-fitting exploration via a "Next-Door" analysis.

Keli Liu, Jelena Markovic, Robert Tibshirani More powerful post-selection inference, with application to the Lasso

Stephen Bates and Robert Tibshirani Log-ratio Lasso: Scalable, Sparse Estimation for Log-ratio Models. To appear, Biometrika

Robert Tibshirani and Jerry Friedman A Pliable Lasso.

Robert Tibshirani. Introductory lecture on statistical learning; JSM2017

Sam Gross and Robert Tibshirani Data Shared Lasso: A novel tool to discover uplift. Computational Statistics & Data Analysis Volume 101 Issue C, September 2016 Pages 226-235

Jonathan Taylor and Robert Tibshirani Post-selection inference for L1-penalized likelihood models .

Robert Tibshirani. Some Recent Advances in Post-selection inference. Breiman Invited lecture, NIPS 2015

Robert Tibshirani. Post-selection inference with an application to internal inference; Seattle Biostatistics conference 2015

Jonathan Taylor and Robert Tibshirani. Statistical learning and selective inference PNAS 2015.

Robert Tibshirani. Keynote address to Canadian actuarial society 2015

Ryan Tibshirani, Alessandro Rinaldo, Robert Tibshirani, and Larry Wasserman. Uniform Asymptotic Inference and the Bootstrap After Model Selection.

Robert Tibshirani. Two novel applications of selective inference` (talk slides)

Workshop on Statistical inference for large scale data (SFU), April. 2015

Robert Tibshirani. Cancer detection via the lasso and customized training. (talk slides)

ML/Google distinguished lecture, CMU, Oct. 2014

Jonathan Taylor, Richard Lockhart, Ryan Tibshirani and Robert Tibshirani. Post-selection adaptive inference for Forward stepwise and Least Angle Regression: Talk slides

Jonathan Taylor, Richard Lockhart, Ryan Tibshirani and Robert Tibshirani. Post-selection adaptive inference for Least Angle Regression and the Lasso (revised version, oct 1, 2014)

Robert Tibshirani. In praise of sparsity and convexity To appear, 50th Anniversity volume for COPSS

Max Grazier G'Sell, Stefan Wager, Alexandra Chouldechova, Robert Tibshirani. False Discovery Rate Control for Sequential Selection Procedures, with Application to the Lasso (submitted)

Richard Lockhart, Jonathan Taylor, Ryan Tibshirani and Robert Tibshirani. A significance test for the lasso Rejoinder (Annals of Statistics 2014)

Talk slides, PIMS-UBC Constance Van Eeden lecture April 2014

Talk slides, ENAR presidential invited address, March 2014

Talk slides, CMU, Oct. 21, 2013

Talk slides, SSC Gold medal address, May 2013

1st Movie for talk- null model

2nd Movie for talk- 2 signal variables

R package (beta version!)

"Big data: how to avoid a big mess"; Talk slides for presentation at Big data in biomedicine conference, Stanford, May 2013.

Jacob Bien, Noah Simon and Robert Tibshirani. A lasso for hierarchical testing of interactions

Lu Tian, Ash Alizadeh, Andrew Gentles and Robert Tibshirani. A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates (submitted)

Noah Simon and Robert Tibshirani. A Permutation Approach to Testing Interactions in Many Dimensions (submitted)

Jacob Bien, Jonathan Taylor, and Robert Tibshirani A lasso for hierarchial interactions .

Ann. Statist. Volume 41, Number 3 (2013), 1111-1141. Talk slides

The lasso: some novel algorithms and applications (Talk slides; ASA chapter 2012)

Noah Simon and Robert Tibshirani. Comment on "Detecting Novel Associations in Large Data Sets" by Reshef et. al (Science Dec. 2011)

Noah Simon and Robert Tibshirani. Standardization and the Group Lasso Penalty Statistica Sinica 22 (2012), 983-1001

Noah Simon and Robert Tibshirani. Regularization Paths for Cox's Proportional Hazards Model Journal of Statistical Software (2011)

Jacob Bien and Robert Tibshirani Sparse Estimation of a Covariance Matrix. Biometrika. 98(4). 807-820

Jacob Bien and Robert Tibshirani Hierarchical Clustering with Prototypes via Minimax Linkage Journal of the American Statistical Association. 106(495). 1075-1084

Jacob Bien and Robert Tibshirani Prototype Selection for Interpretable Classification Accepted for publication in Annals of Applied Statistics.

Jun Li and Robert Tibshirani. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical research.

Tibshirani, R. Regression shrinkage and selection via the lasso: a retrospective. JRSSB retrospective read paper, vol 73, part 3, page 273-282.

The lasso: some novel algorithms and applications (Talk slides; Purdue feb 2011)

talk slides; R scripts for papers

Jerome Friedman, Trevor Hastie and Robert Tibshirani: Applications of the lasso and grouped lasso to the estimation of sparse graphical models

Jerome Friedman, Trevor Hastie and Robert Tibshirani: A note on the group lasso and a sparse group lasso

Witten DM and R Tibshirani (2010) Supervised multidimensional scaling for visualization, classification, and bipartite ranking.

Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Journal of the American Statistical Association

Witten DM and R Tibshirani (2010). Survival analysis with high-dimensional covariates. Statistical Methods in Medical Research

G. I. Allen and R. Tibshirani, "Inference with Transposable Data: Modeling the Effects of Row and Column Correlations", (Submitted), 2010. [pdf]

G. I. Allen and R. Tibshirani, "Transposable regularized covariance models with an application to missing data imputation",

Lu Tian and Robert Tibshirani (2009). Adaptive index models for marker-based risk stratification. Biostatistics, July 27, 1010.

Robert Tibshirani (2009). Univariate shrinkage in the Cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology, vol 8.

Witten DM and R Tibshirani (2009). Extensions of sparse canonical correlation analysis, with applications to genomic data. SAGMB 2009.

Witten DM, Hastie, T. and R Tibshirani (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics

Witten DM and R Tibshirani (2009). Covariance-regularized regression and classification for high-dimensional problems. Journal of the Royal Statistical Society, Series B

Note that this version of the manuscript contains a clarification to Section 3.2: if scout is performed using an alternative covariance estimator, then that estimator should be positive definite. [r library]

Ryan Tibshirani and Rob Tibshirani (2008). A Bias Correction for the Minimum Error Rate in Cross-validation. Published in Annals of Applied Statistics

Witten DM and R Tibshirani and T Hastie (2009). A penalized matrix decomposition, with applications to sparse canonical correlation analysis and principal components Biostatistics 10,3,p515-534

Jerome Friedman, Trevor Hastie and Robert Tibshirani: Regularized Paths for Generalized Linear Models via Coordinate Descent. We use coordinate descent to develop regularization paths for linear, logistic and multinomial regression models. Our algorithms use the lasso and "elastic net" penalties of Zou and Hastie (2005), and create the path for a grid of values of the penalty parameter lambda. The glmnet package for fitting Lasso and elastic net models can be found on CRAN . Here is a MATLAB version .

Witten DM and R Tibshirani (2008)Testing significance of features by lassoed principal components. Annals of Applied Statistics

Daniela Witten and Robert Tibshirani

A comparison of fold change and the t-statistic for microarray data analysis

Bradley Efron, Trevor Hastie, Robert Tibshirani, Discussion: The Dantzig selector: Statistical estimation when p is much larger than n, The Annals of Statistics. Volume 35, Number 6 (2007), 2358-2364.

Jerome Friedman, Trevor Hastie and Robert Tibshirani Sparse inverse covariance estimation with the graphical lasso. Biostatistics, December 12, 2007.

We develop an efficient algorithm for solving the L1-penalized likelihood approach to sparse covariance estimation.

Hui Zou, Trevor Hastie and Robert Tibshirani

On degrees of freedom of the lasso (Annals of Statistics Volume 35, Number 5 (2007), 2173-2192. )

A technical paper that establishes that the number of non-zero coefficients in a lasso model is unbiassed for the effective degrees of freedom

Gen Nowak and Robert Tibshirani Complementary hierarchical clustering. Biostatistics, December 12, 2007.

Hoefling, H., and Tibshirani R.

A study of Pre-validation. Published In Annals of Applied Statistics.

Hoefling, H., Getz, G, and Tibshirani R.

Comment on "Significance of candidate cancer genes as assessed by the CaMP score" by Parmigiani et al. (pdf file)

Friedman, J., Hastie T, Tibshirani R.

Pathwise coordinate optimization (ps file)

(pdf) The Annals of Applied Statistics. Volume 1, Number 2 (2007), 302-332

We show how coordinate descent algorithms can efficiently solve a number of popular regularized optimization problems, creating an entire path of solutions. We generalize this approach to derive an efficient algorithm for the fused lasso, both one- and two-dimensional. Annals of Applied Statistics (2007), 1(2), 302-332.

Rob Tibshirani and Pei Wang

Spatial smoothing and hot spot detection for CGH data using the Fused Lasso (pdf file)

To appear, Biostatistics.

Guo Y, Hastie T, Tibshirani R.

Regularized linear discriminant analysis and its application in microarrays. Biostatistics. 2007 Jan;8(1):86-100. Epub 2006 Apr 7.

A method, similar to shrunken centroids, for classification and discrimination of microarrays, using regularized discriminant analysis with gene selection.

Bradley Efron and Rob Tibshirani Tech report. August 2006

On testing the significance of sets of genes (ps file)

(pdf)

Published in Annals of Applied Statistics

Hui Zou, Trevor Hastie, and Rob Tibshirani. Sparse Principal Component Analysis. (pdf)

We present a new approach to principal component analysis, that allows us to use an L1 penalty to ensure sparseness of the loadings. Published in JCGS 2006 15(2): 262-286. Software is available in R package elasticnet available from CRAN. Rob Tibshirani, Larry Wasserman Tech report. July 2006

Correlation-sharing for detection of differential gene expression (ps file)

(pdf)

Debashis Paul, Eric Bair, Trevor Hastie, Rob Tibshirani. Tech report. April 2006

``Pre-conditioning'' for feature selection and regression in high-dimensional problems (revised dec 2006) (ps file)

(pdf)

To appear, Annals of Statistics.

Trevor Hastie, Jon Taylor, Rob Tibshirani, Guenther Walther. Tech report. March 2006

Forward Stagewise Regression and the Monotone Lasso (ps file)

(pdf)

To appear, Electronic Journal of Statistics.

Rob Tibshirani and Trevor Hastie. Tech report. Feb. 2006.

Margin trees for high-dimensional classification (ps file) (pdf file)

R linux package

R Windows package

Published in J. Mach. Learn. Res.

A tree-structured representation for a multiclass SVM classifier.

Rob Tibshirani and Trevor Hastie. Tech report. Jan. 2006. published in Biostatistics May 2006.

Outlier sums for differential gene expression analysis (published version- pdf)

(technical report- ps)

Biostatistics January 2007; 8: 2 - 8.

Rob Tibshirani. Tech report. October 2005.

A simple method for assessing sample sizes in microarray experiments (ps file)

(pdf file)

BMC Bioinformatics 2006, 7:106

Jon Taylor and Rob Tibshirani. Tech report. July 2005.

Biostatistics April 2006; 7: 167 - 181.

A tail strength measure for assessing the overall significance in a dataset (ps file)

(pdf file)

Published in Biostatistics.

Hugh Chipman and Rob Tibshirani. Tech report. 2005.

"Hybrid Hierarchical Clustering with Applications to Microarray Data" (pdf file)

Biostatistics, Nov. 21, 2005.

R package

Eric Bair, Trevor Hastie, Debashis Paul, Rob Tibshirani. Tech report. 2004.

Prediction by supervised principal components. (ps file)

( pdf file)

Published in J. Amer. Statist. Assoc. (2006), vol. 101, p 119-

IMS 2007 medallion lecture slides (pdf)

Superpc R package

Jon Taylor, Rob Tibshirani and Brad Efron. Tech report. June 2004.

The ``Miss rate'' for the analysis of gene expression data. ( pdf file)

Published in Biostatistics 2005 6(1):111-117.

Trevor Hastie and Robert Tibshirani. Efficient quadratic regularization for expression arrays (Published in Biostatistics)

I. S. Lossos,Debra Czerwinski, Ash Alizadeh, Mark Wechser, Rob Tibshirani, David Botstein and Ronald Levy.

N. Engl. J. Med. 2004;350,1828-37

Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six Genes. (paper- pdf file)

(commentary- pdf file)

(pdf file) Published in Bioinformatics. talk slides (pdf file)

Robert Tibshirani, Michael Saunders, Saharon Rosset, and Ji Zhu. Sparsity and smoothness via the fused lasso (ps file)

(pdf file) Published in J. Royal. Statist. Soc. B.

Eric Bair and Robert Tibshirani. Semi-supervised methods to predict patient survival from gene expression data. (pdf file)

Published in PLOS biology.

John Storey and Robert Tibshirani. Statistical significance for genome-wide experiments. (pdf file)

Published in PNAS.

Robert Tibshirani and Eric Bair. Improved detection of differential gene expression through the singular value decomposition (ps file)

(pdf file)

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Class prediction by nearest shrunken centroids, with applications to DNA microarrays (pdf file) . This is a more statistical version of the PNAS paper below. Published in Statistical Science, 2003.

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. "Diagnosis of multiple cancer types by shrunken centroids of gene expression" (PNAS website). PNAS 2002 99:6567-6572 (May 14).

Brad Efron, Trevor Hastie, Iain Johnstone, Rob Tibshirani. Least angle regression (ps file).

pdf version Tech. report, Mar 2002. My co-authors LARS software for Splus, and a Splus helpfile. The software computes the entire LAR, Lasso or Stagewise path in the same order of computations as a single least-squares fit. The Lasso page

"pdf version".

Bradley Efron and Robert Tibshirani

"Microarrays, Empirical Bayes Methods, and False Discovery Rates" (ps file)

"pdf version". Published in Genet Epidemiol. 2002 Jun;23(1):70-86

Robert Tibshirani, Guenther Walther, David Botstein and Pat Brown

"Cluster validation by prediction strength" (ps file)

"pdf version". Published in Journal of Computational & Graphical Statistics, Volume 14, Number 3, September 2005, pp. 511-528(18)

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Michael Eisen, Gavin Sherlock, Pat Brown, and David Botstein

"Exploratory screening of genes and clusters from microarray experiments" (ps file)

"pdf version".

"Significance analysis of microarrays applied to the ionizing radiation response." (ps file)

"pdf version". PNAS 98: 5116-5124 (April 24) "Raw data"

"Statistical challenges in the analysis of DNA microarray data" (ps file)

"pdf version". Lecture delivered to National Academy of Sciences, November 2000.

Bradley Efron, John Storey, Robert Tibshirani, Virginia Tusher.

"Empirical Bayes Analysis of a Microarray Experiment" (ps file) .

"pdf version". Tech report, October 2000. "Raw data"

Trevor Hastie, Robert Tibshirani, David Botstein and Pat Brown,

"Supervised Harvesting of Expression Trees" (ps file) .

"pdf version". Tech. report. August 2000. Published in Genome Biology, 2001 (www.genomebiology.com)

Donald Redelmeier and Robert Tibshirani, "Are those other drivers really going faster?" . Chance 2000;13:8-14.

Robert Tibshirani, Guenther Walther and Trevor Hastie.

"Estimating the number of clusters in a dataset via the Gap statistic" (ps file).

"pdf version". Tech. report. March 2000. Published in JRSSB 2000.

Alizadeh, A. and 23 others. "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling" Nature 403, 503-511 (2000)

Trevor Hastie, Robert Tibshirani, Michael Eisen, Pat Brown,
Doug Ross, Uwe Scherf, John Weinstein, Ash Alizadeh, Louis Staudt, David Botstein

"Gene Shaving: a New Class of Clustering Methods for Expression Arrays".
Tech. report. Jan 2000.

Donald Redelmeier and Robert Tibshirani, "Why cars in the next lane seem to go faster" . Nature 1999;401:35-36.

Tibshirani, R "Learning from Data: Statistical Advances and Challenges Keynote address: Splus Users conference, Oct 21, 1999

Tibshirani, R., and Lazzeroni, L. and Hastie, T. and Olshen, A. and Cox, D.R. "A global pairwise approach to radiation hydrid mapping". Tech. report Jan. 1999,. Using data of co-occurrence of hybridized markers after shattering, inference is made of the marker sequence in the chromosome.

Friedman, J., Hastie, T. and Tibshirani, R "Additive Logistic Regression: a Statistical View of Boosting". Tech. report July 23, 1998 . We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far. Tibshirani, R "Learning from Data: Statistical Advances and Challenges". Plenary lecture, Center for automated learning and discovery, CMU, June 13, 1998.

Hastie, T. and Tibshirani, R "Bayesian backfitting" , "Abstract", March, 1998. The Gibbs sampler looks and feels like the backfitting algorithm for fitting additive models. Indeed, a simple modification to backfitting turns it into a Gibbs sampler for spitting out samples from the "posterior" distribution for an additive fit.

Tibshirani, R. and Knight, K. "The covariance inflation criterion for model selection" , November, 1997.

Tibshirani, R. "Some thoughts from half a career in statistics" , June 1997.

Rao, J.S.. and Tibshirani, R. "The out-of-bootstrap method for model averaging and selection" Technical Report, May 1997.

Ennis, Hinton, Naylor, Revow and Tibshirani "A comparison of statistical learning methods on the GUSTO database"

Efron, B. and Tibshirani, R. "The Problem of Regions" Technical Report, Feb 1997.

Hastie, T. and Tibshirani, R. "Classification by pairwise coupling" Technical Report, Nov. 1996. We solve a multiclass classification problem by combining all the pairwise rules. This paper builds on ideas proposed by J. Friedman.

Tibshirani, R. Who is the fastest man in the world?". Tech report, Sept. 1996. Revised January, 1997. American Statistician.

Tibshirani, R. "Two applications of the bootstrap". Gordon Ashton Memorial Lecture, Guelph, Sep 17 1996.

Redelmeier, D. and Tibshirani, R. "Cellular telephones and automobile collisions: some variations on matched case-control analysis". Technical Report (Jan 1997).

Hastie, T., Ikeda, D. and Tibshirani, R. "Computer-aided diagnosis of mammographic masses" Technical Report (June 1996). This is a big file (5mb) because of the images. Here is a much smaller version without the images. It has been accepted for publication in J. Comp. and Graph. Statistics.

Tibshirani, R. "Bias, variance and prediction error for classification rules" Technical Report (April 1996), revised November 1996.

Tibshirani, R. and Knight, K. "Model search and inference by bootstrap ``bumping'' " Technical Report (Nov. 1995). It has been accepted for publication in J. Comp. and Graph. Statistics.

Efron, B. and Tibshirani, R. "Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule (text only)" Technical Report (May. 1995).

Hastie, T. J. and Tibshirani, R. "Discriminant Adaptive Nearest Neighbor Classification." Technical Report (Dec. 1994).

Tibshirani, R. "A comparison of some error estimates for neural network models" Technical Report, to appear Neural Computation 1995.

Tibshirani, R., and Hinton, G.E. "Coaching variables for regression and classification" " Technical Report (July 1994).

Tibshirani, R. "Regression selection and shrinkage via the lasso" Technical Report (June. 1994).

Hastie, T. J. and Tibshirani, R. "Discriminant Analysis by Gaussian Mixtures." To appear in JRSSB (Dec 1994). For longer technical report of Feb 1994 click here.

Tibshirani, R. "A proposal for variable selection in the Cox model" Technical Report (June. 1994).

Hastie, T. J., Buja, A., and Tibshirani, R. B "Penalized Discriminant Analysis." Technical Report (May 1994).

Hastie, T.J and Tibshirani, R. "Handwritten Digit Recognition via Deformable Prototypes." AT&T Bell Laboratories Technical Report, 1994.

Hastie, T. J., Tibshirani, R. and Buja, A. "Flexible Discriminant Analysis by Optimal Scoring." Technical Report (Dec. 1993), appeared in JASA, December 1994. AT&T Bell Labs Technical Report (Dec. 1993).

Tibshirani, R. "Principal curves revisited." Appeared in Stat and computing 1992.

Hastie, Trevor, and Tibshirani, R.
"Generalized Additive Models
*Statistical Science * Vol 1, No 3, pages 297-310

Tibshirani, R. My PhD thesis on Local likelhood estimation.

Tibshirani, R. A Plain man's guide to the proportional hazards model

- S routine for bumping. Shar file.
(Rob Tibshirani, tibs@utstat.toronto.edu).
- S routines for Flexible Discriminant Analysis. These tools are
enhancements on the discr and gdiscr functions, and allow
linear, polynomial, and nonparametric versions of
discriminant analysis. There are easy to use predict
methods, as well as a postscript version of a paper
describing the techniques. Shar file. (Trevor Hastie and
Rob Tibshirani, trevor@playfair.stanford.edu).
- FORTRAN program for fitting generalized additive models.
(Trevor Hastie and Rob Tibshirani, trevor@playfair.stanford.edu).
- lasso
- S functions for the lasso. (Rob Tibshirani,
tibs@utstat.toronto.edu).
- avas
- S functions for avas,(additivity and variance stabilization)
(Rob Tibshirani,
tibs@utstat.toronto.edu).
- bootstrap
- S functions for bootstrap, jacknife and cross-validation
(Rob Tibshirani,
tibs@utstat.toronto.edu).
- varcoef
- S functions for varying coefficient models
(Rafal Kustra rafal@utstat.toronto.edu and Rob Tibshirani,
tibs@utstat.toronto.edu).