In [36]:
%%capture
%load_ext rpy2.ipython
%matplotlib inline
from IPython.display import HTML
from matplotlib import pyplot
import os, numpy as np
np.random.seed(0)
import pandas, statsmodels.api as sm
from selection.algorithms.lasso import lasso, data_carving
import itable
HTML("""
<style type="text/css">
    .emphred {color: #cc2222; font-size: 100%; font-weight: bold; font-family: arial,helvetica,sans-serif}
    .emphblue {color: #2222cc; font-size: 100%; font-weight: bold; font-family: arial,helvetica,sans-serif}
</style>
""")

# Run with commit a9eec2db226088bfd89e3d61f9fbc20d933ae83a
# from http://github.com/jonathan-taylor/selective-inference
Out[36]:

Selective inference in regression

Jonathan Taylor (Stanford)

Inference for Large Scale Data, April 20, 2015

http://statweb.stanford.edu/~jtaylo/talks/sfu2015, (notebook)

Outline

  • Selective inference

  • Running example: model selection with the LASSO (arxiv.org/1311.6238 )

  • A general framework for selective inference (arxiv.org/1410.2597 )

  • Further examples of selective inference.

Acknowledgements

This is joint work with many:

  • Yunjin Choi
  • Will Fithian
  • Jason Lee
  • Richard Lockhart
  • Joshua Loftus
  • Stephen Reid
  • Dennis Sun
  • Yuekai Sun
  • Xiaoying Tian
  • Rob Tibshirani
  • Ryan Tibshirani
  • Others in progress...

Selective inference

  • Arguably, in modern science there is often no hypothesis specified before collecting data.
    • Screening in *omics
    • Peak / bump hunting in neuroimaging
    • Model selection in regression
  • Frequentist inference requires specifying hypotheses before collecting data.

  • We describe a version of selective inference which allows for valid inference after some exploration.

Tukey and Exploratory Data Analysis (EDA)

Tukey held that too much emphasis in statistics was placed on statistical hypothesis testing (confirmatory data analysis); more emphasis needed to be placed on using data to suggest hypotheses to test.

Tukey and Exploratory Data Analysis (EDA)

  • Reading further:

... confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

Selective inference

  • Today, I will focus on testing hypotheses suggested by the data.

  • Answer is parametric, interesting asymptotic questions remain.

Running example

  • In vitro measurement of resistance of sample of HIV viruses to NRTI drug 3TC.

  • 633 cases, and 91 different mutations occuring more than 10 times in sample.

  • Source: HIVDB

  • Goal: to build an interpretable predictive model of resistance based on mutation pattern.

In [37]:
if not os.path.exists("NRTI_DATA.txt"):
    NRTI = pandas.read_table("http://hivdb.stanford.edu/pages/published_analysis/genophenoPNAS2006/DATA/NRTI_DATA.txt", na_values="NA")
else:
    NRTI = pandas.read_table("NRTI_DATA.txt")
NRTI_specific = []
NRTI_muts = []
mixtures = np.zeros(NRTI.shape[0])
for i in range(1,241):
    d = NRTI['P%d' % i]
    for mut in np.unique(d):
        if mut not in ['-','.'] and len(mut) == 1:
            test = np.equal(d, mut)
            if test.sum() > 10:
                NRTI_specific.append(np.array(np.equal(d, mut))) 
                NRTI_muts.append("P%d%s" % (i,mut))

NRTI_specific = NRTI.from_records(np.array(NRTI_specific).T, columns=NRTI_muts)
In [38]:
# Next, standardize the data, keeping only those where Y is not missing

X_NRTI = np.array(NRTI_specific, np.float)
Y = NRTI['3TC'] # shorthand
keep = ~np.isnan(Y).astype(np.bool)
X_NRTI = X_NRTI[np.nonzero(keep)]; Y=Y[keep]
Y = np.array(np.log(Y), np.float); Y -= Y.mean()
X_NRTI -= X_NRTI.mean(0)[None, :]; X_NRTI /= X_NRTI.std(0)[None,:]
X = X_NRTI # shorthand
In [39]:
# Design matrix
# Columns are site / amino acid pairs
X.shape
Out[39]:
(633, 91)
In [40]:
# Variable names
NRTI_muts[:10], len(NRTI_muts)
Out[40]:
(['P6D',
  'P20R',
  'P21I',
  'P35I',
  'P35M',
  'P35T',
  'P39A',
  'P41L',
  'P43E',
  'P43N'],
 91)
In [41]:
%%capture
# Fit OLS model and plot coefficients

n, p = X.shape
ols_fit = sm.OLS(Y, X).fit()
sigma_3TC = np.linalg.norm(ols_fit.resid) / np.sqrt(n-p-1)
OLS_3TC = ols_fit.params

fig_3TC = pyplot.figure(figsize=(24,12))
ax_3TC = fig_3TC.gca()
ax_3TC.bar(np.arange(1, len(NRTI_muts)+1)-0.5, OLS_3TC, label='OLS', color='red', alpha=0.7)                                                          
ax_3TC.set_xticks(range(1, (len(NRTI_muts)+1)))
ax_3TC.set_xticklabels(NRTI_muts, rotation='vertical', fontsize=18) 
ax_3TC.set_xlim([-1, len(NRTI_muts)+1])
ax_3TC.set_title(r'OLS coefficients for 3TC resistance, $\hat{\sigma}$=%0.1e' % sigma_3TC, fontsize=50)
ax_3TC.set_ylabel('Parameter estimates', fontsize=30)
ax_3TC.legend(fontsize=30)
;
Out[41]:
''
In [42]:
fig_3TC
Out[42]:

Model selection with the LASSO

  • Many coefficients seem small.

  • Use the LASSO to select variables: \[ \hat{\beta}_{\lambda} = \text{argmin}_{\beta \in \mathbb{R}^p} \frac{1}{2} \|y-X\beta\|^2_2 + \lambda \|\beta\|_1. \]

  • Theoretically motivated choice of \(\lambda\) (Negabhan et al., 2012) \[ \lambda = \kappa \cdot \mathbb{E}( \|X^T\epsilon\|_{\infty}), \qquad \epsilon \sim N(0, \sigma^2 I). \]

  • Used \(\kappa=1\) below, \(\sigma^2\) the usual estimate from full model.

arxiv.org/1311.6238

In [43]:
# Choose a LASSO parameter
lambda_theoretical = []
for i in range(10000):
    lambda_theoretical.append(np.fabs(np.dot(X.T, np.random.standard_normal(n))).max())
lambda_theoretical = np.round((1 * np.mean(lambda_theoretical) * sigma_3TC))

Variables chosen for 3TC

In [44]:
lambda_theoretical
Out[44]:
43.0
In [45]:
from selection.algorithms.lasso import lasso
lasso_3TC = lasso(Y, X, lambda_theoretical, sigma=sigma_3TC)
lasso_3TC.fit()
active_3TC = [NRTI_muts[i] for i in lasso_3TC.active]
In [46]:
active_3TC
Out[46]:
['P62V',
 'P65R',
 'P67N',
 'P69i',
 'P75I',
 'P77L',
 'P83K',
 'P90I',
 'P115F',
 'P151M',
 'P181C',
 'P184V',
 'P190A',
 'P215F',
 'P215Y',
 'P219R']
In [47]:
%%capture
ax_3TC.bar(np.arange(1, len(NRTI_muts)+1)-0.5, lasso_3TC.soln, label='LASSO', color='gray') 
ax_3TC.set_title(r'LASSO coefficients ($\lambda=%0.1f$)' % lambda_theoretical, fontsize=50)
ax_3TC.legend(fontsize=30, loc='upper left')
Out[47]:
<matplotlib.legend.Legend at 0x10e3aa6d0>
In [48]:
fig_3TC
Out[48]:

Inference after LASSO

  • The LASSO selected \(\hat{E} \subset\) NRTI_muts of size 16 at \(\lambda \approx 43\).

  • What to report?

  • Naive inference after selection is wrong.

  • Reference distribution of selected model is biased because of cherry picking.

  • Why not fix it?

In [49]:
%%capture
selected_OLS = sm.OLS(Y, X[:,lasso_3TC.active]).fit()
OLS_lower, OLS_upper = np.asarray(selected_OLS.conf_int()).T
selective_lower = lasso_3TC.intervals['lower'] # [i[0] for _, _, _, i in lasso_3TC.intervals]
selective_upper = lasso_3TC.intervals['upper'] # [i[1] for _, _, _, i in lasso_3TC.intervals]

selective_intervals = ([['Mutation', 'OLS Lower', 'OLS Upper', 'Selective Lower', 'Selective Upper']] + 
                       zip(active_3TC, OLS_lower, OLS_upper, selective_lower, selective_upper))
In [50]:
%%capture
fig_select = pyplot.figure(figsize=(24,12))
ax_select = fig_select.gca()
ax_select.bar(np.arange(1, len(active_3TC)+1)-0.5, selected_OLS.params, color='grey', alpha=0.5)                                                          
ax_select.set_xticks(range(1, (len(active_3TC)+1)))
ax_select.set_xticklabels(active_3TC, rotation='vertical', fontsize=18) 
ax_select.set_xlim([0, len(active_3TC)+1])
ax_select.set_title(r'Selective' % sigma_3TC, fontsize=50)
ax_select.set_ylabel('Parameter values', fontsize=30)
ax_select.errorbar(np.arange(1, (len(active_3TC)+1)), selected_OLS.params, 
                yerr=[selected_OLS.params - OLS_lower, 
                      OLS_upper - selected_OLS.params],
                capsize=10,
                capthick=5,
                fmt=None,
                label='OLS (no coverage guarantees)',
                elinewidth=3,
                ecolor='k')
ax_select.legend(fontsize=30, loc='upper left', numpoints=1)
ax_select.plot([0, len(active_3TC)+1], [0,0], 'k--')
Out[50]:
[<matplotlib.lines.Line2D at 0x10f8a2d10>]
In [51]:
fig_select
Out[51]:
In [52]:
%%capture
ax_select.errorbar(np.arange(1, (len(active_3TC)+1)), selected_OLS.params,
                yerr=[selected_OLS.params - selective_lower, 
                      selective_upper - selected_OLS.params],
                capsize=10,
                capthick=5,
                fmt=None,
                label='Selective',
                elinewidth=3,
                ecolor='r')
ax_select.legend(fontsize=30, loc='upper left', numpoints=1)
Out[52]:
<matplotlib.legend.Legend at 0x111edc3d0>
In [53]:
fig_select
Out[53]:

What are these intervals?

In [54]:
fig_select
Out[54]:

Intervals consistent with the data, having observed active set.

In [55]:
fig_select
Out[55]:

Intervals seem to be long.

Setup for selective inference

  • Laid out formally in arxiv.org/1410.2597.

  • Data \(y \sim F\). (We have no model for \(F\) at this point!)

  • Set of questions \({\cal Q}\) we might ask about \(F\).

  • Use some exploratory technique to generate questions \[ \widehat{\cal Q}(y) \subseteq {\cal Q}. \]
    • Solve LASSO at some fixed \(\lambda\) and look at active set.
    • Choose a model by BIC and forward stepwise or best subset.
    • Marginal screening.
  • Test some or all of the hypotheses suggested by the point process \(\widehat{\cal Q}(y)\).

LASSO path

Instead of a fixed \(\lambda\), we might look at the LASSO path.

In [56]:
%%R -i X,Y
library(lars)
plot(lars(X, as.numeric(Y), type='lar'))

LASSO path

  • A sequential procedure might consider "event times" \(\lambda_j\).

  • Might take \[{\cal Q} = \{1 \leq j \leq p\}\]

  • The selection procedure is \[j^*(y)= \widehat{\cal Q}(y) = \text{argmax}_{1 \leq j \leq p} |X_j^Ty|.\]

  • Note \[ |X_{j^*(y)}^Ty| = \lambda_1. \]

  • Under \(H_0: \beta \equiv 0\) (and normalization) ( covTest) \[ \lambda_1(\lambda_1 - \lambda_2) \overset{D}{\to} \text{Exp}(1). \]

  • In fact, under \(H_0:\beta \equiv 0\) (Kac-Rice) \[ \frac{1 - \Phi(\lambda_1)}{1 - \Phi(\lambda_2)} \overset{D}{=} \text{Unif}(0,1). \]

  • Sequential aspect makes the multi-step procedure more complicated than fixed \(\lambda\)...

What are the questions?

Linear regression

  • Define \[{\cal Q} = \left\{(j,E): E \subset \{1, \dots, p\}, j \in E\right\}.\]

  • Indexes the set of OLS functionals, i.e. partial regression coefficients : \[ \beta_{j|E}(\mu) = e_j^TX[:,E]^{\dagger}(\mu), \qquad (j,E) \in {\cal Q}. \]

Simultaneous vs. selective inference

  • Functionals \(\beta_{j|E}\) also appear in POSI.

  • Simultaneous inference over all questions \((j,E) \in {\cal Q}\).

  • Achieved by controlling FWER: find \(K_{\alpha}\) s.t. \[ \mathbb{P} \left(\sup_{(j,E) \in \cal Q} \frac{\beta_{j|E}(\epsilon)}{\|\beta_{j,E}\|_2} > K_{\alpha}\right) \leq \alpha \] where \(\epsilon \sim N(0, \sigma^2 I)\).

Selective inference uses the LASSO to select

\[ \widehat{\cal Q}(y) \subset {\cal Q}. \]

We control coverage and type I error on selected questions.

Selective inference for LASSO

  • Our inference is based on the distribution \[ {\mathbb{Q}}_{E,z_E}(\cdot) = {\mathbb{P}}\left( \ \cdot \ \big\vert (\hat{E}, z_{\hat{E}}) = (E, z_E) \right) \] where \[ \mathbb{P} = \mathbb{P}_{\mu} = N(\mu, \sigma^2 I). \]

  • We derive exact pivots for \(\beta_{j|E}(\mu)\) under \(\mathbb{Q}_{E,z_E}\).

  • Report intervals based on \(\mathbb{Q}_{\hat{E}, z_{\hat{E}}}\).

Selective inference for the LASSO

  • Selection event: \[ \begin{aligned} S(E,z_E) &= \{ (\hat{E}, \hat z_{\hat{E}}) = (E, z_E)\}\\ &= \left\{y: A(E, z_E)y \leq b(E,z_E) \right\} \end{aligned} \]

  • \(A(E,z_E)\) and \(b(E,z_E)\) come from the KKT conditions.

  • Active block \[ \text{diag}(z_E)\left(X_E^{\dagger}y - \lambda (X_E^TX_E)^{-1}z_E\right) \geq 0. \]

  • Inactive block \[ \left\|X_{-E}^T\left((I-X_EX_E^{\dagger})y + \lambda (X_E^T)^{\dagger} z_E \right) \right\|_{\infty} \leq \lambda. \]

Visualizing LASSO partition

(Credit Naftali Harris)

Variables 1 and 3 chosen with positive signs

Inference

Condition on sufficient statistic \(X_1^Ty\).

Inference

Allowing for effect of \(X_1\).

Reduction to univariate problem

  • Law of \(X_3^Ty\) restricted to slice is a 1-parameter exponential family \[ \frac{f_{\theta}(z)}{f_0(z)} \propto \exp(\theta z) \cdot 1_{[{\cal V}^-(z^{\perp}),{\cal V}^+(z^{\perp})]}(z) \] with \(\theta = \beta_{3|\{1,3\}}(\mu)/\sigma^2\).

  • Reference measure: \(f_0={\cal L}(\hat{\beta}_{3|\{1,3\}}(y))\).

Saturated model

  • Nuisance parameter is \(P_{\eta}^{\perp}\mu\).

Saturated model

  • Each \(\eta=\eta(E,z_E)\) determines a truncated univariate Gaussian.
In [57]:
fig_select
Out[57]:

Selective hypothesis tests

  • Under \(\mathbb{Q}_{E,z_E}\), can construct tests \(\phi_{(j|E)}\) of \[ H_{0,(j|E)} : \beta_{j|E}(\mu) = 0. \]

  • Tests satisfy selective type I error guarantee \[ \mathbb{Q}_{E,z_E}(\phi_{j|E}) = \overset{H_{0,(j|E)}}{\leq} \alpha. \]

  • Conditional control implies marginal control.

  • We report results \(\phi_{(j| \hat{E})}(y), j \in \hat{E}\).

In [58]:
selective_pvalues = pandas.DataFrame({'Mutation':active_3TC, 'Naive OLS':selected_OLS.pvalues, 
                                      'Selective':[p for _, p in lasso_3TC.active_pvalues]})
pvalue_table = itable.PrettyTable(selective_pvalues, tstyle=itable.TableStyle(theme="theme1"), center=True)
signif_select = selective_pvalues['Selective'] < 0.05
signif_OLS = selective_pvalues['Naive OLS'] < 0.05
pvalue_table.set_cell_style(cols=[1], rows=np.nonzero(signif_OLS)[0], color='red', font_weight='bold', format_function=lambda p: "%0.3f" % p)
pvalue_table.set_cell_style(cols=[1], rows=np.nonzero(~signif_OLS)[0], format_function=lambda p: "%0.3f" % p)
pvalue_table.set_cell_style(cols=[2], rows=np.nonzero(signif_select)[0], color='red', font_weight='bold', format_function=lambda p: "%0.3f" % p)
pvalue_table.set_cell_style(cols=[2], rows=np.nonzero(~signif_select)[0], format_function=lambda p: "%0.3f" % p)
In [59]:
pvalue_table
Out[59]:
MutationNaive OLSSelective
P62V0.1370.369
P65R0.0000.000
P67N0.0000.000
P69i0.0000.000
P75I0.4840.553
P77L0.2700.469
P83K0.0030.051
P90I0.0000.014
P115F0.0140.168
P151M0.0080.081
P181C0.0000.002
P184V0.0000.000
P190A0.0630.309
P215F0.0000.016
P215Y0.0000.000
P219R0.0010.099

What does it all mean?

Unconditional viewpoint

  • Intervals cover random variables (prediction intervals).

  • Hypotheses seem to be events! But they're not...

What does it all mean?

Conditional viewpoint

  • \(\hat{E}\) is constant. Forming confidence intervals for fixed parameters.

  • Hypothesis tests are fixed parameters of \(\mathbb{Q}_{E,z_E}\).

Who's afraid of random hypotheses?

Exploratory and confirmatory study

  • Measure pilot data \[ S_1 = y_1 | X_1 \]

  • Build a model: \(\hat{E}(y_1)\).

  • Measure a confirmatory sample \[ S_2 = y_2 | X_2 \]

  • Form usual \(t\)-statistics based on \[ \hat{\beta}_2 = X_{2,\hat{E}(y_1)}^{\dagger}y_2. \]

Who's afraid of random hypotheses?

Exploratory and confirmatory study

  • We all know that this is valid....

  • BUT, the probability space is really the joint law \((y_1, y_2)|(X_1, X_2)\).

Null hypotheses in confirmatory stage are clearly random.

Data splitting

  • Use some portion of the data to form a model \(\hat{E}(y_1)\).

  • Perform usual inference for \[ X_{2, \hat{E}(y_1)}^{\dagger}\mu_2. \]

  • Less data for selection, and less data for inference.

Selective inference (can) use all data for exploration and confirmation

Might use only part of the data for exploration.

In [60]:
%%capture
active_carve, selective_pval_carve, selective_interval_carve, split_pval, split_interval = [], [], [], [], []
results = data_carving(Y, X, lam_frac=1., sigma=sigma_3TC, burnin=5000, ndraw=20000, split_frac=0.9,
                       splitting=True)[0]
for result in results:
    active_carve.append(NRTI_muts[result[0]])
    selective_pval_carve.append(result[1])
    selective_interval_carve.append(result[2])
    split_pval.append(result[3])
    split_interval.append(result[4])

selective_interval_carve = np.array(selective_interval_carve).T
split_interval = np.array(split_interval).T
split_OLS = np.mean(split_interval, 0)

fig_carve = pyplot.figure(figsize=(24,12))
ax_carve = fig_carve.gca()
ax_carve.bar(np.arange(1, len(active_carve)+1)-0.5, split_OLS, color='grey', alpha=0.5)                                                          
ax_carve.set_xticks(range(1, (len(active_carve)+1)))
ax_carve.set_xticklabels([v[1:] for v in active_carve], rotation='vertical', fontsize=18) 
ax_carve.set_xlim([0, len(active_carve)+1])
ax_carve.set_title(r'Intervals of selected model using 90% for selection', fontsize=50)
ax_carve.set_ylabel('Parameter values', fontsize=30)
ax_carve.errorbar(np.arange(1, (len(active_carve)+1)), split_OLS, 
                yerr=[split_OLS - split_interval[0],
                      split_interval[1] - split_OLS],
                capsize=10,
                capthick=5,
                fmt=None,
                label='Data splitting',
                elinewidth=3,
                ecolor='k')
ax_carve.legend(fontsize=30, loc='upper left', numpoints=1)
ax_carve.plot([0, len(active_carve)+1], [0,0], 'k--')
Out[60]:
[<matplotlib.lines.Line2D at 0x111c08d10>]
In [61]:
fig_carve
Out[61]:
In [62]:
%%capture
ax_carve.errorbar(np.arange(1, (len(active_carve)+1)), split_OLS,
                yerr=[split_OLS - selective_interval_carve[0],
                      selective_interval_carve[1] - split_OLS],
                capsize=10,
                capthick=5,
                fmt=None,
                label='Data carving',
                elinewidth=3,
                ecolor='r')
ax_carve.legend(fontsize=30, loc='upper left', numpoints=1)
Out[62]:
<matplotlib.legend.Legend at 0x10fd17490>
In [63]:
fig_carve
Out[63]:

Data carving shorter than data splitting.

In [64]:
fig_carve
Out[64]:

Inference cannot be reduced to simple univariate problem.

In [65]:
carve_pvalues = pandas.DataFrame({'Mutation':active_carve, 
                                  'Data splitting':split_pval, 
                                  'Data carving':selective_pval_carve})
carve_pvalues = carve_pvalues.reindex_axis(['Mutation', 'Data splitting', 'Data carving'], axis=1)
carve_pvalue_table = itable.PrettyTable(carve_pvalues, tstyle=itable.TableStyle(theme="theme1"), center=True)
signif_carve = carve_pvalues['Data carving'] < 0.05
signif_split = carve_pvalues['Data splitting'] < 0.05
carve_pvalue_table.set_cell_style(cols=[1], rows=np.nonzero(signif_split)[0], color='red', font_weight='bold', format_function=lambda p: "%0.3f" % p)
carve_pvalue_table.set_cell_style(cols=[1], rows=np.nonzero(~signif_split)[0], format_function=lambda p: "%0.3f" % p)
carve_pvalue_table.set_cell_style(cols=[2], rows=np.nonzero(signif_carve)[0], color='red', font_weight='bold', format_function=lambda p: "%0.3f" % p)
carve_pvalue_table.set_cell_style(cols=[2], rows=np.nonzero(~signif_carve)[0], format_function=lambda p: "%0.3f" % p)
In [66]:
carve_pvalue_table
Out[66]:
MutationData splittingData carving
P41L0.1550.048
P62V0.5550.204
P65R0.2610.000
P67N0.1730.000
P69i0.4790.000
P77L0.0250.231
P83K0.8760.013
P115F0.0680.066
P116Y0.2200.204
P181C0.0000.000
P184V0.000-0.000
P215F0.0000.009
P215Y0.2840.000
P219R0.0900.035

Data carving

Holding out more data, data carving still beats data splitting.

arxiv.org/1410.2597

A general framework for selective inference

  • Nothing was directly tied to LASSO as long as we can describe selection event, i.e. the event that the selected model became interesting.

  • Nothing was directly tied to the model \(N(\mu, \sigma^2 I)\) either.

  • Can carry out inference known (or unknown) \(\sigma\) for selected model: \[ \mathbb{Q}_{\beta_E;(E,z_E)}(\cdot) = \mathbb{P}_{\beta_E}\left( \ \cdot \ \big\vert (\hat{E}, z_{\hat{E}}) = (E, z_E) \right) \] where \[ \mathbb{P}_{\beta_E} \overset{D}{=} N(X_E\beta_E, \sigma^2 I). \]

  • Typically requires Monte Carlo inference.

arxiv.org/1410.2597

Data carving with known variance

  • Split the data \(y=(y_1,y_2)\), \(X=(X_1,X_2)\).

  • Run LASSO on \((y_1,X_1,\lambda)\).

  • Selection event: affine constraints on \(y_1\).

Data carving with known variance

  • Unconditional distribution: \[ \frac{d\mathbb{P}_{\beta_E}}{d\mathbb{P}_0}(y) \propto \exp\left( \frac{1}{\sigma^2}\beta_E^TX_E^Ty \right), \qquad \mathbb{P}_0 = N(0, \sigma^2 I). \]

  • Selective distribution: \[ \frac{d\mathbb{Q}_{\beta_E;(E,z_E)}}{d\mathbb{P}_{\beta_E}}(y) \propto 1_{S(E,z_E)}(y). \]

  • Inference for \(\beta_{j|E}\): condition \(\mathbb{Q}_{\beta_E;(E,z_E)}\) on \(X_{E\setminus\{j\}}^Ty\).

  • Monte Carlo sampling from multivariate Gaussian subject to affine constraints.

In [67]:
fig_carve
Out[67]:
In [68]:
carve_pvalue_table
Out[68]:
MutationData splittingData carving
P41L0.1550.048
P62V0.5550.204
P65R0.2610.000
P67N0.1730.000
P69i0.4790.000
P77L0.0250.231
P83K0.8760.013
P115F0.0680.066
P116Y0.2200.204
P181C0.0000.000
P184V0.000-0.000
P215F0.0000.009
P215Y0.2840.000
P219R0.0900.035

Other examples in the literature

  • Selective intervals (Benjamini & Yekutieli; Weinstein, Fithian and Benjamini; others)

  • Drop-the-losers binomial designs (Sampson & Sill)

  • \(p\)-values for maxima of random fields (Schwartzman & Chen)

  • Effect size estimation (Benjamini & Rosenblatt; Zhong & Prentice)

Other examples we have looked at

Least Angle Regression

arxiv.org/1401.3889

Least Angle Regression

  • Asymptotic analysis of first step of LAR / LASSO / FS was considered in covTest.

  • Selective framework provides exact test of global null \[ \exp(-\lambda_2(\lambda_1-\lambda_2)) \approx \frac{1 - \Phi(\lambda_1)}{1 - \Phi(\lambda_2)} \overset{H_0:\beta\equiv 0}{\sim} \text{Unif}(0,1) \]

  • LAR sequence up to \(k\) steps (tracking signs on entering) can be expressed as a set of affine inequalities (including AIC stopping).

  • Exact extension of covTest beyond first step.

Categorical variables

  • The LAR approach does not generally allow grouped (i.e. categorical variables).

  • Extension to groups of covTest.

  • Coming soon: inference for complete FS path for grouped variables.

arxiv.org/1405.3920

Square-root LASSO

\[ \hat{\beta}_{\gamma} = \text{argmin}_{\beta \in \mathbb{R}^p} \|y-X\beta\|_2 + \gamma \|\beta\|_1. \]

  • Tuning parameter is free of \(\sigma\).

  • Selection event is no longer convex in general, though selective inference still possible.

  • Also coming soon.

Estimation

  • Selective distribution used for hypothesis tests, intervals.

  • Selective pseudo MLE: \[ \int_{\mathbb{R}} X_j^Tz \; \mathbb{Q}_{\hat{\beta}_{j|E}(y);(E,z_E)}\left(dz \big\vert X_{E\setminus j}^Tz=X_{E\setminus j}^Ty \right) = X_j^Ty. \]

  • Mean estimation in orthogonal design after BH.

  • Provides estimate of \(\sigma^2\) in \(\sqrt{\text{LASSO}}\).

arxiv.org/1405.3340

Asymptotics

  • Inference so far is very parameteric.

  • We have some partial results: arxiv.org/abs/1501.03588.

  • Ryan Tibshirani and others at CMU also have something coming.

  • What about GLMs? No explicit results yet.

Peak inference in neuroimaging & critical points

(Credit Wikipedia)

In some sense, this is where it all started...

Peak inference in neuroimaging

  • Let \((T(x))_{x \in B}\) be an image of test statistics (SPM).

  • Set \({\cal Q}=B\), and \(\hat{\cal Q}(T)\) to be the set of local maxima / minima.

  • Report p-value for each critical point in \(\hat{\cal Q}(T)\).

Peak inference in neuroimaging

  • Long history in brain imaging (work of Worsley, Friston, Benjamini).

  • Goal was simultaneous inference.

  • Simultaneous tools can be converted to selective tools (arxiv.org/1308.3020).

  • Similar approach can be used for testing in PCA (arxiv.org/1410.8260).

  • Recent work of Schwartzman & Chen (arxiv.org/1405.1400).

  • Selective distributions: Slepian models / Palm distributions.

Thanks

  • NSF-DMS 1208857 and AFOSR-113039.

  • Many collaborators.

In [69]:
import os
def build():
    cmd = '''

ipython nbconvert --to slides --profile=stats   --reveal-prefix=http://statweb.stanford.edu/~jtaylo/reveal.js  %(title)s.ipynb;
mkdir www;
cp -r %(title)s.slides.html www/index.html;
cp fig_lasso*png www;
cp convex.svg www; 
cp %(title)s.ipynb www;
''' % {'title':'sfu'}
    print cmd
    os.system(cmd)
build()


ipython nbconvert --to slides --profile=stats   --reveal-prefix=http://statweb.stanford.edu/~jtaylo/reveal.js  sfu.ipynb;
mkdir www;
cp -r sfu.slides.html www/index.html;
cp fig_lasso*png www;
cp convex.svg www; 
cp sfu.ipynb www;