Reports
Most of these papers are based upon work supported by the
National Science Foundation under Grants:
IIS1837931,
DMS1521145,
DMS1407397,
DMS0906056,
DMS0604939,
DMS0306612,
and DMS0072445.
Any opinions, findings, and conclusions or recommendations
expressed in this material are those
of the author(s) and do not necessarily reflect the
views of the National Science Foundation.
Slides from talks are beside some of the articles.
More talks are here.
There is work on empirical likelihood,
Monte Carlo & quasiMonte Carlo, computer experiments,
transposable data, hypothesis testing and bioinformatics. You
may have to scroll down for some of those.
Papers by year of initial posting
Each article goes under the year when it was first
written, usually as a technical report. Revisions don't usually
make an article move up the list.
2019

Owen, A. B. and Zhou, Y.
The square root rule for adaptive importance sampling
PDF
 arXiv
Suppose you have uncorrelated unbiased estimates \(\hat\mu_k, 1\leqslant k\leqslant K\) with
variances that decay like \(k^{y}\) for \(0\leqslant y\leqslant 1\).
Not knowing \(y\), you weight them proportionally to \(k^x\) for \(0\leqslant x\leqslant 1\).
If you choose \(x=1/2\), your estimate will remain unbiased and will have variance at most
9/8 times that of the unknown best unbiased combination. Weighting by an estimated variance
would be prone to bias in rare event settings because the sample values are prone to being
correlated with the variance estimates.
This theorem comes from the appendix of an unpublished technical report from 1999.
2018

Owen, A. B.
Unreasonable effectiveness of Monte Carlo
PDF
This is a comment on a paper by
Briol, Oates, Girolami, Osborne and Sejdinovic
entitled "Probabilistic Integration: A Role in Statistical Computation?"
My take is that the Bayesian approach holds tremendous promise but is better
directed at more complicated problems than estimating an integral. For instance
most Bayesian computation is now done by frequentist MCMC methods.
Here is their rejoinder.
The paper, all discussions, and rejoinder will appear in Statistical Science.

Owen, A. B. and Varian, H.
Optimizing the tiebreaker regression discontinuity design
PDF
We consider settings where one could use either a
Randomized Controlled Trial (RCT) or a Regression
Discontinuity Design (RDD) to analyze the causal impact
a treatment that offers customers some sort of bonus
or incentive. The RDD is expected to have superior
immediate payoff on the customers under study.
The RCT is well known to have
greater statistical efficiency. We study a hybrid tiebreaker
design in which a fraction \(\Delta\in(0,1)\) of the customers
get a randomized treatment. We find statistical efficiency
increases monotonically with \(\Delta\) in a two regression
line model. The cost of experimentation decreases monotonically
in \(\Delta\) and we find an expression for optimizing the tradeoff.
This is work I did for Google, and it was not part of my Stanford duties.

Dobriban, E. and Owen, A. B.
Deterministic parallel analysis
PDF
It is really hard to pick the number k of factors in a factor analysis.
Parallel analysis is one of the most popular ways. People say it works well.
In Buja and Eyoboglu's version we take a data matrix X and apply uniform random permutations
to each column some number of times, computing the factor analysis each time. If the real
first eigenvalue is larger than say 95% of the simulated ones then we take \(k\geqslant 1\)
and look at the second eigenvalues. And so on. This paper is about how to do that without
actually running all those permutations which are expensive and introduce unneeded randomness into
the outcome. Instead we can use random matrix theory to determine where the upper edge of the spectrum
would be if there were no correlations among the data.

A. Ben Abdellah, P. L'Ecuyer, A. B. Owen, F. Puchhammer
Density estimation by randomized quasiMonte Carlo
arXiv
We have a deterministic function \(x = g(\boldsymbol{u})\) on the unit cube
\((0,1)^s\), or some other domain that
can be mapped to from the unit cube. We want to estimate the probability density function of \(x=g(\boldsymbol{u})\) with respect to \(\boldsymbol{u}\sim U(0,1)^s\). We do this by kernel density estimation and by histograms applied to randomized quasiMonte Carlo input points in the unit cube. Using RQMC reduces the variance leaving the bias unaffected. That makes it possible to use smaller bandwidths in the KDE. The gain can be mitigated by the reduced smoothness of the implied integrand when h is small. There can be large improvements in low dimensions.

Rosenman, E. and Baiocchi, M. and Owen, A. B.
Propensity score methods for merging observational and experimental datasets
PDF
Consider a large observational dataset (ODB) and a much smaller randomized controlled trial (RCT)
about the same treatement intervention. The RCT may support better causal conclusions, while the ODB has much more
data and may have better external validity (a more representative sample).
We consider merging the two, using the propensity score that the RCT data would have had,
had they been in the ODB. We develop two approaches, comparing them by simulations and by
the bias and variance they attain in a delta method approximation. Even a small RCT can greatly
improve the estimation of treatment effects over what the ODB alone can do.

Owen, A. B and Glynn, P. (editors)
Monte Carlo and QuasiMonte Carlo Methods
MCQMC, Stanford CA, August 2016
Springer web site for this book
The proceedings volume of MCQMC 2016
has been published by Springer. It has 26 contributions on Monte Carlo and quasiMonte Carlo.
It includes 3 tutorials on QMC, written by Fred Hickernell, Pierre L'Ecuyer and Frances Kuo.
2017

Owen, A. B., Chertkov, M. and Maximov, Y.
Importance sampling the union of rare events with an application to power systems analysis
arXiv
We consider importance sampling to estimate the probability \(\mu\) of a union of \(J\) rare events \(H_j\)
defined by a random variable \(\boldsymbol{x}\).
The sampler we study has been used in spatial statistics, genomics and combinatorics
going back at least to Frigessi and Vercellis (1985).
It works by sampling one event at random, then sampling \(\boldsymbol{x}\)
conditionally on that event happening and it constructs an unbiased estimate of \(\mu\)
by multiplying an inverse moment of the number of occuring events by the union bound.
We prove some variance bounds for this sampler. For a sample size of \(n\),
it has a variance no larger than \(\mu(\bar\mu\mu)/n\) where \(\bar\mu\)
is the union bound.
It also has a coefficient of variation no larger than
\(\sqrt{(J+J^{1}2)/(4n)}\) regardless of the overlap pattern among the events.
Our motivating problem comes from power system reliability, where the phase
differences between connected nodes have a joint Gaussian distribution and
the \(J\) rare events arise from unacceptably large phase differences.
In the grid reliability problems even some events defined by \(5772\) constraints
in \(326\) dimensions, with probability below \(10^{22}\),
are estimated with a coefficient of variation of about \(0.0024\) with only \(n=10{,}000\) sample values.

Owen, A. B.
Effective dimension of some weighted preSobolev spaces with dominating mixed partial derivatives
arXiv
PDF  revised PDF
Old title: Effective dimension of weighted Sobolev spaces: nonperiodic case
This paper considers two notions of effective dimension for
quadrature in weighted preSobolev spaces
with dominating mixed partial derivatives.
We begin by finding a ball in those spaces just barely
large enough to contain a function with unit variance.
If no function in that ball has more than \(\varepsilon\)
of its variance from ANOVA components involving
interactions of order \(s\) or more, then the space
has effective dimension at most \(s\) in the superposition sense.
A similar truncation sense notion replaces the cardinality
of the ANOVA component by the largest index it contains.
Some Poincar\'e type inequalities
are used to bound variance components by multiples
of these space's squared norm and those
in turn provide bounds on effective dimension.
Very low effective dimension in the superposition
sense holds for some spaces defined by product weights
in which quadrature is strongly tractable.
The superposition dimension is
\(O( \log(1/\varepsilon)/\log(\log(1/\varepsilon)))\)
just like the superposition dimension used in the multidimensional
decomposition method. Surprisingly,
even spaces where all subset weights are equal, regardless
of their cardinality or included indices, have low superposition
dimension in this sense.
This paper does not require periodicity of the integrands.

Owen, A. B.
On \(L^2\) norms of derivatives of orthogonal polynomials with respect to Sobolev inner products
PDF
For \(\lambda\ge0\),
let \(\langle f,g\rangle_\lambda := \int_{1}^1 f(x)g(x) d x
+\lambda\int_{1}^1 f'(x)g'(x) d x\) define an inner product
for differentiable functions on \([1,1]\).
For \(n\ge0\), let \(S_n=S_n^\lambda\) be the orthogonal polynomials of degree \(n\)
obtained by applying the GramSchmidt algorithm in this
inner product to monomials, normalized so that \(S_n(1)=1\).
Then the derivatives \(S_n'\) are given as explicit Legendre series,
it is proved that \(\arg\min_{n\ge1}\int_{1}^1 S_n'(x)^2 d x/\int_{1}^1S_n(x)^2 d x=1\),
and an expression is given for \(\int_{1}^1 S'_n(x)S'_{n'}(x) d x\).

Owen, A. B.
A randomized Halton algorithm in R
PDF
Randomized quasiMonte Carlo (RQMC) sampling can bring orders of magnitude reduction in variance compared to plain Monte Carlo (MC) sampling. The extent of the efficiency gain varies from problem to problem and can be hard to predict. This article presents an R function rhalton that produces scrambled versions of Halton sequences. On some problems it brings efficiency gains of several thousand fold. On other problems, the efficiency gain is minor. The code is designed to make it easy to determine whether a given integrand will benefit from RQMC sampling. An RQMC sample of n points in \([0,1]^d\) can be extended later to a larger n and/or d.

Gao, K. and Owen, A. B.
Estimation and inference for very large linear mixed effects models
PDF
 PDF (supplement)
 Slides
 Code
Linear mixed models with large imbalanced crossed random effects structures pose severe computational problems for maximum likelihood estimation and for Bayesian analysis. The costs can grow as fast as \(N^{3/2}\) when there are N observations. Such problems arise in any setting where the underlying factors satisfy a many to many relationship (instead of a nested one) and in electronic commerce applications, the N can be quite large. Methods that do not account for the correlation structure can greatly underestimate uncertainty. We propose a method of moments approach that takes account of the correlation structure and that can be computed at O(N) cost. The method of moments is very amenable to parallel computation and it does not require parametric distributional assumptions, tuning parameters or convergence diagnostics. For the regression coefficients, we give conditions for consistency and asymptotic normality as well as a consistent variance estimate. For the variance components, we give conditions for consistency and we use consistent estimates of a mildly conservative variance estimate. All of these computations can be done in O(N) work. We illustrate the algorithm with some data from Stitch Fix where the crossed random effects correspond to clients and items.
This one appeared in 2016, but has been very substantially revised.
Katelyn Gao's thesis
2016

Owen, A. B. and Launay, T.
Multibrand geographic experiments
PDF
It is about geographic experiments. The idea is to run B experiments on G regions at once
and then use a Stein shrinkage method from Xie, Kou and Brown (2012), or Bayesian statistics via Stan, to pool the results.
Pooling means you can work with smaller experiments than if you did them one at a time. B and G are both even.
In the design, each experiment is toggled up in G/2 regions and down in G/2 regions. Each region is
up for B/2 experiments and down for B/2 experiments. The Bayesian approach seemed to work a bit better.
The design is found using a Markov chain studied by Diaconis and Gangolli. For G>B the design is a
two level factorial for each experiment and is also a supersaturated design for the regions. It would be nice
if the GxB matrix had no duplicate rows or columns and no rows or columns that are exact opposites. The paper
provides some sufficient conditions for the existence of such designs (with constructions).
This is work I did for Google, and it was not part of my Stanford duties.

Owen, A. B.
Confidence intervals with control of the sign error in low power settings
PDF
When hypothesis tests of \(H_0 : \theta=0\) have low power, it is possible that their rejection can frequently be accompanied by an
estimate \(\hat\theta\) that has the wrong sign and significantly exaggerated magnitude. Such sign errors are less likely when the confidence interval for \(\theta\) is well separated from 0, as measured in units of the confidence interval's width. Sign errors can be controlled by declaring confidence in the sign of \(\theta\) only when \(H_0\) is also rejected at a smaller level \(\alpha_2 = 2\alpha_1\alpha_S\),
where \(\alpha_S \le 1/2\) is a user specified upper bound on the probability of a sign error given that \(H_0\) has been rejected.
This procedure is a very simple form of inference after model selection: the selected model is that \(\theta\ne0\) and the second inference
is then on sign(\(\theta\)).

Gao, K. and Owen, A. B.
Estimation and inference for very large linear mixed effects models
PDF
 PDF (supplement)
Very large crossed data sets are increasingly common especially in ecommerce. It is often appropriate to model them with crossed
random effects. Their size provides challenges for statistical analysis. For such large data sets, the computational costs of
estimation and inference should grow at most linearly with the sample size. Commonly used maximum likelihood and Bayesian
approaches do not scale at that rate. We propose a method of moments based algorithm that scales linearly and can be easily
parallelized at the expense of some loss of statistical efficiency. We apply the algorithm to some data from Stitch Fix where
the crossed random effects correspond to clients and items. The random effects analysis is able to account for the increased
variance due to intraclient and intraitem correlations in the data. Ignoring the correlation structure can lead to standard
error underestimates of over 10fold for that data. This algorithm is proven to give estimates that are asymptotically
consistent and normally distributed. We use a martingale CLT decomposition of Ustatistics to establish normality for
the variance components.
Katelyn Gao's thesis

Wang, J., Sabatti, C. and Owen, A. B.
Adaptive filtering multiple testing procedures
for partial conjunction hypotheses
arXiv
 PDF
The partial conjunction null hypothesis \(H_0^{r/n}\) allows up to
nr+1 related basic null hypotheses to hold. Rejecting it
allows us to conclude that at least r of the basic nulls
are false, providing a measure of reproducibility.
Motivated by genomic problems we consider a setting with a large number M of partial conjunction null
hypotheses to test, based on an \(n\times M\) matrix of pvalues. When r>1 the hypothesis \(H_0^{r/n}\) is composite.
Validity versus the case with r1 alternative hypotheses holding can lead to very conservative tests. We
develop a filtering approach for \(H_0^{r/n}\) based on the M pvalues for \(H_0^{(r1)/n}\). This filtering
approach has greater power than straightforward PC testing. We prove that it can be used to control the
familywise error rate, the per family error rate, and the false discovery rate among M PC tests. In
simulations we find that our filtering approach properly controls the FDR while achieving good power.

Owen, A. B. and Prieur, C.
On Shapley value for measuring importance of dependent inputs
Revised PDF
 arXiv
 HAL
This paper makes the case for using Shapley value to quantify the importance of random input variables to a function. Alternatives based on the ANOVA decomposition can run into conceptual and computational problems when the input variables are dependent. Our main goal here is to show that Shapley value removes the conceptual problems. We do this with some simple examples where Shapley value leads to intuitively reasonable nearly closed form values.
Errata: In Theorem 4.1 the condition that \(\Sigma\) has full rank should instead be
that \(\Sigma\) is positive definite. The matrix in the paragraph following Theorem 4.1 should be
\(\bigl(\begin{smallmatrix} 1 & \rho\\\rho & 1\end{smallmatrix}\bigr)\)
not
\(\bigl(\begin{smallmatrix} 0 & \rho\\\rho & 0\end{smallmatrix}\bigr)\).

Basu, K. and Owen, A. B.
QuasiMonte Carlo for an integrand with a singularity along a diagonal in the square PDF
A singularity along an arbitrary manifold works differently for QMC than one at an isolated point or along the boundary of the
unit cube. We look at a singularity along the diagonal of \([0,1]^2\) as a basic example. There are several ways to handle this, but the
best one appears to be to transform the integrand into one or more integrands with QMCfriendly singularities (along the boundary
of the unit cube, in this case). For a festschrift marking Ian Sloan's 80th birthday.

He, H. Y., Basu, K., Zhao, Q. and Owen, A. B.
Permutation p‐value approximation via generalized Stolarsky invariance
Annals of Statistics
arXiv 1603.02757
latest PDF  Supplement 
pipeGS R package on CRAN
data
It is really expensive to get a tiny p value via permutations.
For linear test statistics, the permutation p value is the fraction
of permuted data vectors in a given spherical cap.
Stolarsky's invariance principal from quasiMonte Carlo sampling
gives the root mean squared discrepancy between the observed and expected
proportions of points in spherical caps. We use and extend this principal
to develop approximate p values with small relative errors. Their accuracy is competitive
with saddlepoint approximations, but they come with an error estimate.
Hera He's thesis

Gao, K. and Owen, A. B.
Efficient moment calculations for variance components in large unbalanced crossed random effects models
PDF
 supplement
Large unbalanced crossed random effects models provide extreme challenges.
Both maximum likelihood and MCMC scale superlinearly in problem size. We
look at method of moments estimates that scale linearly, while retaining
competitive accuracy with the MLE on problems small enough to allow computation
of the MLE.
Katelyn Gao's thesis
2015

Basu, K. and Owen, A. B.
Transformations and HardyKrause variation
PDF
 arXiv
 SINUM
A transformation \(\tau\) may be used to map the unit
cube onto the simplex, disk, sphere
and related spaces. For use with QMC integration of f
we want \(f\circ\tau\) to be of bounded variation in the
sense of Hardy and Krause. We give such conditions using
a multivariable Faa di Bruno formula of Constantine and Savits.
A similar condition makes \(f\circ\tau\) smooth enough to
benefit from scrambled net quadrature. We also apply Faa di
Bruno to find conditions for importance sampling and RosenblattHlawkaMuck
sequential inversion to be suitable for QMC. Finally we give an importance sampled mapping
from the unit cube to the simplex that results in RQMC quadratures with
RMSE \(O(n^{3/2+\epsilon})\) for a class of functions generalizing polynomials.
Kinjal Basu's thesis

Owen, A. B.
Statistically efficient thinning of a Markov chain sampler
JCGS
revised PDF 
arXiv  R code
Thinning or subsampling an MCMC is well known to decrease efficiency
if we simply discard computed values. Suppose that it costs one unit of time to
advance the Markov chain, and \(\theta\) units to compute the quantity of
interest. If we thin to every k'th value, \(k\geqslant2\) then we don't have to compute the
quantity of interest at every iteration and so we can run the chain longer.
If the autocorrelations in the process are nonincreasing and nonnegative then
there is alway a \(\theta\) large enough to make thinning to every k'th value
more efficient than not thinning.
Very commonly autocorrelations resemble those of an AR(1) process, \(\rho_\ell = \rho^{\ell}\).
This paper gives an algorithm for computing the optimal
subsampling factor k for the AR(1) case. When \(\rho\) is large the optimal thinning value k can
be large. When \(\theta\) is large the efficiency gain can be important. Taking k=1 (no thinning) is
optimal for \(\rho>0\) if and only if \(\theta \leqslant (1\rho)^2/(2\rho)\). If \(1<\rho<0\) then k=1
is optimal for any \(\theta\geqslant0\).

Wang, J, Zhao, Q., Hastie, T. and Owen, A. B.
Confounder adjustment in multiple hypothesis testing
arXiv
We look at the problem of large scale hypothesis testing when there are latent variables
that induce correlations among the tests, and more seriously, correlate with the primary
variable (e.g. treatment) under study. LEAPP handles this by assuming sparsity of effects
and RUV4 handles it by assuming some negative controls: tests known a priori to be null.
We provide theoretical gaurantees for versions of these algorithms and generalize them to
handle multiple primary variables. When the confounding variables are strong enough, then the
methods can perform as well asymptotically as an oracle that observes the latent variables.
Jingshu Wang's thesis 
Yunting Sun's thesis

Wang, J and Owen, A. B.
Admissibility in partial conjunction testing
arXiv
 PDF (to appear in JASA)
Admissibility of metaanalysis has been well understood since Allan Birnbaum's work in the 1950s. Any valid combined pvalue obeying a monotonicity constraint is optimal at some alternative and hence admissible. In an exponential family context, the admissible tests reduce to those with a convex acceptance region. The partial conjunction null hypothesis is that at most r  1 of n independent component hypotheses are nonnull with r = 1 corresponding to a usual metaanalysis. Benjamini and Heller (2008) provide a valid test for this null by ignoring the r  1 smallest pvalues and applying a valid metaanalysis pvalue to the remaining n  r + 1 pvalues. We provide sufficient conditions for the admissibility of their test among monotone tests. A generalization of their test also provides admissible monotone tests and we show that admissible monotone tests are necessarily of that generalized form. If one does not require monotonicity then their test is no longer admissible, but the dominating tests are too unreasonable to be used in practice.

Lee, M. and Owen, A. B.
Single nugget Kriging
PDF
We propose a method with better predictions at extreme values than the standard method of Kriging. We construct our predictor
in two ways: by penalizing the mean squared error through conditional bias and by penalizing the conditional likelihood at
the target function value. Our prediction exhibits robustness to model mismatch in the covariance parameters, a desirable
feature for computer simulations with a restricted number of data points. Applications on several functions show that our
predictor is robust to the nonGaussianity of the function.
Minyong Lee's thesis
 Dobriban, E, Fortney, K. Kim, S. K. and Owen, A. B.
Optimal multiple testing under a Gaussian prior on effect sizes
Biometrika
arXiv  PDF
We develop a new method for frequentist multiple testing with Bayesian prior information. Our procedure finds a new set of optimal pvalue weights called the Bayes weights. Prior information is relevant to many multiple testing problems. Existing methods assume fixed, known effect sizes available from previous studies. However, the case of uncertain information is more usual. For a Gaussian prior on effect sizes, we show that finding the optimal weights is a nonconvex problem. Despite the nonconvexity, we give an efficient algorithm that solves this problem nearly exactly. We show that our method can discover new loci in genomewide association studies. On several data sets it compares favorably to other methods. Open source code is available.

Lawrence, M., Huntley, M. A., Stawiski, E., Owen, A. B., Wu, T. D., Goldstein, L., Cao, Y., Degenhardt, J., Young, J.,
Guillory, J., Heldens, S., Jackson, A., Seshagiri, S., Gentleman, R.
Genomic variant calling: Flexible tools and a diagnostic data set
bioR\(\chi\)iv
 Supplement
This is about identifying low frequency genetic variants in tumors.
The supplement includes a use of the Cauchy link in binomial regression. A quasibinomial logistic regression
gave overdispersions like \(10^9\); Cauchy is much better.

Fortney, K., Dobriban, E., Garagnani, P., Pirazzini, C., Monti, D., Mari, D., Atzmon, G., Barzilai, N., Franceschi, C., Owen, A. B., Kim, S. K.
Genomewide scan informed by agerelated disease identifies loci for exceptional human longevity
We find some new SNPs associated with extreme longevity using frequentist
multiple testing based on Bayesian priors from the companion Biometrika paper.
PLoS  genetics
 Basu, K. and Owen, A. B.
Scrambled geometric net integration over general product spaces
Foundations
of Computational Mathematics 
arXiv 
PDF
We develop scrambled net quadrature over Cartesian products of triangles, disks,
spherical triangles and generalizations. For smooth functions on an sfold product
of ddimensional sets the variance is
\(O(n^{12/d}\log(n)^{s1})\).
Kinjal Basu's thesis
 Wang, J. and Owen, A. B.
Bicrossvalidation for factor analysis
published article
from Statistical Science
 slides
We use bicrossvalidation, holding out some rows and some columns
of a data matrix, to choose the number of factors for a factor analysis.
The criterion is recovery of the underlying signal matrix.
Jingshu Wang's thesis
 He, Z. and Owen, A. B.
Discussion on `Sequential QuasiMonteCarlo Sampling', by Gerber & Chopin
PDF
Text from copyright form:
This is the prepeer reviewed version of the discussion named above that was published
in its final form at
JRSSB.
(Technically this is a comment that was not peer reviewed, but the journal
did copy edit what we sent.)
2014
 He, H. Y. and Owen, A. B.
Optimal mixture weights in multiple importance sampling
PDF
We show that the mixture weights in multiple importance sampling
can be optimized via convex optimization.
 He, Z. and Owen, A. B.
Extensible grids:
uniform sampling on a spacefilling curve
PDF
We study QMC in \([0,1]^d\) by QMC and randomized QMC
on \([0,1]\) followed by applying Hilbert's spacefilling
curve. We find that the result has the same highdimensional
performance as sampling on a grid.
As bad as that is in high dimensions,
it is best possible for some hard problems,
and the proposed method does not require sample
sizes of the form \(n=m^d\). An RQMC version attains
MSE \( O(n^{12/d})\) for integration of Lipshitz continuous functions.
 Larson, J. L. and Owen, A. B.
Moment based gene set tests
BMC bioinformatics
npGSEA Bioconductor package
old PDF
Permutation tests are a popular way to test
whether a set of genes is associated with a
treatment, or reversing the hopedfor causal arrow,
a phenotype. But they are expensive. We develop
moment based alternatives that are much faster.
Both beta and Gaussian approximations are available
for linear statistics (similar to the JG score).
We also develop a scaled chisquare approximation
to a sum of squared regression coefficients. Such
a test was best overall among 261 gene set tests
investigated by Ackermann and Strimmer (2009). We
illustrate the method on three public data sets
related to Parkinson's disease, and find some enriched
sets not noticed in the original publications.
 Owen, A. B.
A constraint on extensible quadrature rules
Numerische Mathematik
preprint
Suppose that the best possible rate for a quadrature
problem is \(O(n^{\alpha})\) for \(\alpha>1\), for
a simple average of function values.
Suppose further that a rate optimal sequence uses
sample sizes \(n_k\). This paper extends an idea from
Sobol' (1998) to give a lower bound on \(\rho = n_{k+1}/n_k\).
The bound is between 1 and 2, so it always rules out arithmetic
sequences and never rules out sample size doubling.
This version (Jan 2015) fixes an error in the proof of Theorem 1.
(Neither statement nor conclusion had to change.)
 Basu, K. and Owen, A. B.
Low discrepancy constructions in the triangle
SIAM Journal on Numerical Analysis
 PDF
We give two explicit
constructions of point sets in the triangle
with vanishing discrepancy. One adapts the van der Corput
sequence to the triangle. It has discrepancy
at most 12/\(\sqrt{N}\). The other scales a regular grid
then rotates it through an angle with a badly approximable tangent
attaining discrepancy \(O(\log(N)/N)\). For smooth functions,
randomizing the van der Corput sequence gives RMSE
\(O(1/N)\).
Kinjal Basu's thesis
 Owen, A. B. and Roediger, P. A.
The sign of the logistic regression coefficient
American Statistician (teacher's corner)
arXiv 
PDF
This paper settles a conjecture that Paul Roediger
made (with D. M. Ray and B. T. Neyer)
in a comment on a paper by Jeff Wu and Y. Tang. In logistic
regression on a scalar \(x\), the MLE of the slope coefficient
\(\beta\) satisfies sign(\(\hat\beta\)) = sign(\(\bar x_1\bar x_0\)),
where \(\bar x_y\) is the sample mean of x for Y=y.
That this should usually be the case is intuitively obvious.
That one cannot wiggle out of it by tweaking the variances,
skewnesses and/or outliers in the two x groups is less obvious.
One might imagine it follows from the means being sufficient statistics,
but they are only conditionally sufficient. Besides it holds
also for Probit models and others whose inverse link is the
CDF of a logconcave density, and where there is no tiny
sufficient statistic conditional or otherwise. There is
a generalization to vector valued predictors.
2013
 Owen, A. B.
Sobol' indices and Shapley value
PDF
Sobol' indices are used to measure the importance of
input variables in black box functions. Shapley value
is used by economists to apportion the value of a team's
efforts among its individual members. This paper compares
the methods. Neither kind of Sobol' index yields the
Shapley value for variance explained. Compared
to Shapley value, Sobol's lower index ignores interactions
while Sobol's upper index overcounts them.
 Billman, D. and He, H. and Owen, A. B.
Grouping tasks and data display items
via the nonnegative matrix factorization
PDF
Our data are a list of 119 tasks that a pilot
must perform in flying a modern air liner,
210 input and output variables available to
the pilot, and a matrix indicating whether a given IO variable
is required for a given task. We use biclustering
of this data to aid in designing an interface.
 Owen, A. B. and Dick, J. and Chen. S.
Higher order Sobol' indices
original PDF  Published version:
Information and Inference 2014(3)5981
We generalize Sobol' indices from an \(L^2\) to \(L^p\)
(integer \(p\ge2\))
setting in order to emphasize those variables that
most affect extreme values of the function.
Our generalizations have integral representations that
allow Monte Carlo or quasiMonte Carlo estimation.
 Chen, A., Owen, A. B. and Shi, M.
Data enriched linear regression
arXiv 
revised Nov 2014
We have a small high quality data set of (X,Y) values
following a Gaussian linear regression, and a potentially much larger data
set following a possibly different Gaussian linear regression. We apply
a new form of Stein shrinkage to connect the two. It becomes inadmissible to
just use the small data set when there are p\(\ge\)5 predictors
and the error df \(\ge\)10. The new form of shrinkage outperforms
ordinary Stein shrinkage in simulations and we provide an explanation
by comparing them both to an oracle.
2012
2011

Ma, L, Wong, W. H. and Owen, A. B.
A sparse transmission disequilibrium test for
haplotypes based on BradleyTerry graphs
PDF

Owen, A. B. and Eckles, D.
Bootstrapping data arrays of arbitrary order
PDF
slides
McCullagh (2000) showed that
there is no bootstrap for the crossed random effects model.
But resampling each factor (rows and columns) independently
works well and is slightly conservative.
Here we replace resampling by reweighting and then extend
the result to data arrays of arbitrary order. Reweighting
is better suited to large data warehouses than resampling
is.

Gleich, D. G. and Owen, A. B.
Moment based estimation of stochastic
Kronecker graph parameters
PDF
It is hard to estimate the parameters
of Kronecker random graphs by maximum
likelihood. Here is a method of moments
strategy based on simple feature counts.
We find that moments are more robust and
give better fits than maximum likelihood.

Sun, Y. and Zhang, N. and Owen, A. B.
Multiple hypothesis testing, adjusting
for latent variables
PDF
R package on CRAN
R package tar file
We introduce LEAPP (latent effect adjustment after
primary projection), a method for taking account
of unmeasured latent variables when doing multiple
hypothesis testing. Simulations show good performance
compared to alternatives, EIGENSTRAT and SVA
(surrogate variable analysis). When applied to the
16 tissue AGEMAP data, LEAPP gives lists of agerelated
genes with much more reproducibility (over tissues)
than the other methods. We prove some results on
the LEAPP estimates.
Yunting Sun's thesis
2010

Owen, A. B.
Moment based estimation of stochastic
Kronecker graph parameters
Deprecated. See above article with David Gleich.
It is hard to estimate the parameters
of Kronecker random graphs by maximum
likelihood. Here is a method of moments
strategy based on simple feature counts.
For large data sets the error is dominated
by model lack of fit and so the extra efficiency
of likelihood over moments is less important
than the speed advantage of moments.

Chen, S. and Matsumoto, M. and Nishimura, T. and Owen, A. B.
New inputs and methods for Markov chain
quasiMonte Carlo
PDF
We present some new ways of generating small CUD
sequences. We introduce some embedded antithetic
and round trip variance reductions into MCMC
and prove that they preserve the CUD property.
In some simulations of GARCH and stochastic
volatility models, the new methods greatly
outperform standard IID sampling.
The original publication is available at
www.springerlink.com
(or will be so, once it has been completed).
Su Chen's thesis

Dyer, J. and Owen, A.B.
Visualizing bivariate long tailed data
revised PDF
slides from NIPS
Suppose that we observe two or more categorical
variables with a long tail, such as movies and
customers in ratings data. This paper looks
at a way to visualize the joint distribution of
such data. We use a copula plot based on the
observed ranks of the data. We prove under
a generative model that the observed ranks are
asymptotically close to some underlying true
ranks under a ZipfMandelbrotPoisson assumption.
Some ratings data show a
strong head to tail affinity: busy raters are
over represented at rarely rated items and conversely.
We present two simple generative models that
produce such an effect. One is a saturation
model and the other is bipartite preferential
attachment. We prove bounds on
the marginal distributions for these models.
Justin Dyer's thesis

Dyer, J. and Owen, A.B.
Correct ordering in the ZipfPoisson ensemble
PDF
Revised Jan 2012 PDF
Counted rank data arise commonly, such as most popular
baby names, English words and web sites.
This paper analyzes the reliability of such ordering.
We use a model
where \(X_i\) is Poisson with mean following a Zipf law.
We get estimates for the number n of leading
items \(i=1\dots n\) correctly ordered by their observed counts \(X_i\).
If grows at the rate \((AN/\log(N))^{1/(\alpha+2)}\)
where \(\alpha\) is the Zipf parameter and \(A = \alpha^2(\alpha+2)/4\).
For a ZipfPoisson model of the British National Corpus of 100,000,000 words,
we estimate that the 72 most frequent words are in
their correct order.
Justin Dyer's thesis

She, Y. and Owen, A.B.
Outlier detection using
nonconvex penalized regressions
PDF (orig) 
PDF (revised)
We put in a dummy variable for all n observations
in a regression but regularize their coefficients
via a thresholding rule. The result is robust regression
that empirically is very good at identifying outliers.
A key step is a clean model for which the outliers
become the signal and in which BIC is applicable.
Yiyuan She's thesis
2009

L'Ecuyer, P. and Owen, A.B. (eds)
Monte Carlo and QuasiMonte Carlo Methods 2008
Proceedings of MCQMC 2008, July 611 2008, Montreal Canada.
SpringerVerlag. ISBN 9783642041068
List of articles
 S.C. Emerson and Owen, A.B.
Calibration of the empirical likelihood method for a vector mean
Electronic
Journal of Statistics
This paper presents an approach for getting
outside of the 'convex hull problem' in empirical
likelihood.
Sarah Emerson's thesis
 Owen, A.B.
Recycling physical random variables
EJS
This paper shows how to get n(n1)/2 pairwise independent
random vectors out of just n fully independent ones.
Similar constructions are widely used. What is new here is a
statistical analysis of the consequences for Monte Carlo sampling:
the resulting means are degenerate Ustatistics with nonnormal
limits. Quite surprisingly, their asymptotic distributions come out
symmetric, based on recent results on the spectrum of
circulant matrices.
 Southworth, L.K., Owen, A.B. and Kim, S.K.
Aging mice show a decreasing correlation of gene expression
within genetic modules.
PLoS Genetics
PDF
As mice age, the correlations among sets of related
genes grow weaker.
 Chen, S., Dick, J. and Owen, A.B.
Consistency of Markov chain quasiMonte Carlo on continuous state spaces
PDF
Tribble has made over 1000fold efficiency improvements by
inserting QMC sampling into MCMC problems.
The earlier consistency results for use of QMC in MCMC
only worked for discrete state spaces. This paper
extends them to continuous problems like the ones in
Seth Tribble's
thesis
Su Chen's thesis
 Xu, Y., Dyer, J.S. and Owen, A.B.
Empirical stationary correlations for semisupervised learning on graphs
PDF
Talk
Many methods for semisupervised learning on graphs turn out to be forms
of kriging. They use a correlation structure derived from the graph, but
without taking account of correlations among the observed response values.
We incorporate the empirical correlations into the covariance. In two
example data sets we find greatly improved prediction.
Ya Xu's thesis
2008
 Owen, A.B.
Monte Carlo and QuasiMonte Carlo for Statistics
PDF 
Proceedings of MCQMC 2008 (Montreal)
This is a survey which sketches some topics in statistics that
use Monte Carlo and quasiMonte Carlo methods. The emphasis is
on problems with some open research issues.
 Owen, A.B.
Karl Pearson's metaanalysis revisited
Annals of Statistics  Slides
 Supplementary figures (web)
 Supplementary figures (pdf)
A test of Karl Pearson, thought to be inadmissible for over 50
years, is shown to be admissible. Furthermore it has good power
against alternatives in which all or most of the nonzero
parameters share the same sign. An earlier paper below,
used a big Monte Carlo simulation, where this one uses an FFT
to get exact power. This one also compares to standard tests not ordinarily
thought of as metaanalysis.
 Southworth, L.K., Kim. S.K. and Owen, A.B.
Properties of balanced permutations
Journal of Computational Biology,
April 2009, 16(4): 625638
Balanced permutations are a fascinating idea for microarray
analyses. But we find that they can give very misleading
p values.
2007
 Perry, P.O. and Owen, A.B.
A rotation test to verify latent structure
JMLR Feb 2010
We test the presence of a latent variable in correlated noise
by employing projection pursuit nonnormality measures.
Patrick Perry's thesis
 Zahn, Poosala, Owen, Ingram, Lustig, Carter, Weeratna, Taub, Gorospe, MazanMamczarz, Lakatta,
Boheler, Zu, Mattson, Falco, Ko, Schlessinger, Firman, Kummerfeld, Wood, Longo, Zonderman, Kim, Becker
"AGEMAP: a gene expression database for aging in mice"
PLOS Genetics
This is a preliminary online version of the article. There may be changes.
We look at patterns of aging in mice for 16 different tissues, as measured by gene expression.
 Owen, A.B. and Perry, P.O.
Bicrossvalidation of the SVD and the nonnegative matrix factorization
final PDF
from AOAS
 JSM 2009 slides
 talk at Cambridge University
 older slides
We look at how to pick the rank k when approximating a matrix
by a truncated SVD. We hold out a rectangular submatrix, fit an SVD
to a complementary submatrix, truncate it and predict.
The method extends to the nonnegative matrix factorization
among other models.
Patrick Perry's thesis
 Owen, A.B.
"Pearson's test in a large scale multiple metaanalysis"
PDF
The AGEMAP study generated an 8932 x 16 matrix of p values.
We apply metaanalysis to each row.
A method originally proposed by Pearson (1934) and thought
for over 50 years to be inadmissible performs best in a simulation. We also
show that Pearson's test really is admissible.
 Owen, A.B.
"The pigeonhole bootstrap"
Annals of Applied Statistics
 PDF
 Slides
McCullagh (2000) showed that
large crossed random effects data sets,
such as are now studied for recommender engines and
information retrieval are impossible to bootstrap.
This means that even for balanced homoscedastic
random effects models with no missing data, no bootstrap
correctly estimates the variance of a sample mean (let
alone a more complicated procedure).
But one of the methods he studied, that of
independently resampling rows and columns, comes pretty
close. This article shows the expected bootstrap variance in
that method tracks the desired variance, even
for severely unbalanced and heteroscedastic data sets.
2006
 Owen, A.B.
"A robust hybrid of lasso and ridge regression"
PDF 
Slides
A penalty that behaves like lasso for small coefficients
and like ridge for large coefficients is developed.
This penalty is a reversed Huber function.
The penalty is convex. Like the Huber function it requires
scaling. The scaling parameter can be incorporated into a
criterion that is jointly convex in it and the regression
coefficient vector.
 Owen, A.B.
"Infinitely imbalanced logistic regression"
JMLR
Many binary classfication problems are very unbalanced with
one category much more common than the other. This paper
shows what happens to logistic regression in the limit
where one category's sample size tends to infinity while the
other remains finite. For example one could use logistic
regression to separate a Gaussian measure from a finite data
set. Under mild conditions, the limiting coefficient vector
(apart from the intercept) is finite. It can be expressed in
terms of an exponential tilt and solved for by a convex optimization.
Journal of Machine Learning Research (v 8, pp 761773, 2007)
 Zahn, Sonu, Vogel, Crane, MazanMamczarz, Rabkin, Davis,
Becker, Owen, Kim
"Transcriptional profile of aging in human muscle reveals a common
aging signature"
PLOS Genetics
This paper relates human aging to various genes and gene groups.
It includes a new version of Gene Set Enrichment Analysis (GSEA)
geared to handle regressions and covariates.
The electron transport group of genes are found to be age related
in human kidney, muscle and brain, and in other species.
2005
 Tribble, S.D and Owen, A.B.
"Construction of weakly CUD sequences for MCMC sampling
Electronic Journal of Statistics
An earlier paper below showed that MCMC can be
driven by completely uniformly distributed (CUD)
or weakly CUD (WCUD) point sequences.
This paper shows that
a construction of Liao's that permutes QMC vectors
leads to WCUD point sequences. A theorem
of Niederreiter (1977) implies that certain lattice
constructions satisfy a triangular array version of CUD.
We find QMC methods for MCMC reduce variance by factors
ranging from 10 to several hundred in a 42 dimensional
Gibbs sampling probit example.
A proposal of Liao's for incorporating acceptance rejection
into QMCMCMC
Seth Tribble's thesis
 Owen, A.B.
"Local antithetic sampling with scrambled nets"
Annals of Statistics 2008 
@ arXiv 
@ Project Euclid
A local antithetic reflection strategy can improve
the variance rate of scrambled nets
from \(n^{3/2+\epsilon}\) to \(n^{3/22/d+\epsilon}\)
in dimension d.
The benefit is similar to that which Haber gets
when moving from stratified sampling to stratified
antithetic sampling.
The method also looks like a merger of scrambled nets
and monomial cubature.
 Owen, A.B.
"On the WarnockHalton quasistandard error"
Monte Carlo Methods and Applications
v12 n 1 pp 4754
DOI: 10.1515/156939606776886652
Warnock and Halton have proposed a
method of treating multiple QMC estimates as
replicates.
This paper reproduces an example where the method
seems to work, but cautions that the method
can fail arbitrarily badly.

Lin, Z & Owen, A.B. & Altman, R.
Science
Reply to letters about "Genetic research and human subject privacy".
2004
 Owen, A.B. and Tribble, S.D
"A quasiMonte Carlo Metropolis algorithm"
PNAS
This paper proves that QMC methods can be applied to
MetropolisHastings style MCMC.
The key idea is to use QMC points
that are "completely uniformly distributed".
This is like using the entire period of a small random
number generator.
Seth Tribble's thesis

Rodwell, Sonu, Zahn, Lund, Wilhelmy, Wang, Xiao, Mindrinos, Crane, Segal, Myers, Brooks, Davis, Higgins, Owen and Kim
"A transcriptional profile of aging in the human kidney"
Public Library of Science
We found genes that change expression with age, in the human kidney.
These genes do not tend to be the ones that serve to distinguish kidney
from other tissue types, consistent with a model that aging is similar
in different tissue types. The genes do not overlap with aging
related genes in other species that we looked at.
 Owen, A.B.
"Randomized QMC and point singularities"
PDF
Randomized QMC is shown to have a superior
rate of convergence to ordinary MC on some
functions with square integrable singularities
at unknown locations. Surprisingly that means
RQMC will generally beat importance sampling
asymptotically. Of course one might combine
them.
 Lin, Z & Owen, A.B. & Altman, R.
"Genetic research and human subject privacy"
Science, Vol 305, Issue 5681, 183, 9 July 2004

Owen, A. B.
"Variance of the number of false discoveries".
PDF
 R functions (beta)
 R function documentation (beta)
Given d hypothesis tests at level \(\alpha\) we expect \(d \alpha\)
false positives. When the tests are dependent, the variance
of the number of false positives can be \(O(d^2)\) higher than the
independence value of \(d \alpha(1\alpha)\). This paper
shows how to estimate such a variance taking account of
dependency in the tests.
The R functions are beta and very subject to change.
Feedback is welcomed.
 Owen, A. B. "Halton sequences avoid the origin".
SIAM Review
v 48 n 3 pp 487503
Gives rates of convergence for QMC on unbounded integrands,
using growth conditions on f and a singularity avoidance
pattern for x's. Halton sequences and randomized QMC avoid
the origin suitably.
 Owen, A. B. "Multidimensional variation for quasiMonte Carlo".
PDF
 BiBTeX
Survey, and some new results, on multidimensional variation
(Vitali and HardyKrause) for QuasiMonte Carlo.
2003
 Nomogram for predicting the likelihood
of delayed graft function in adult cadeveric renal transplant patients
Journal of the American Society of Nephrology

Liu, R. Owen, A.B.
"Estimating Mean Dimensionality of ANOVA decompositions"
JASA 2006
Relationships between Sobol's sensitivity indices
and moments of the dimension distribution are established.
The mean dimension is computed for some functions arising in
finance and extreme value theory.
The minimum of d independent uniform random variables is seen
to have strong low dimensional components.

Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D.
"A Bayesian framework for combining heterogeneous data sources for
gene function prediction (in S. cerevisiae)"
PNAS
Online
Supplement
 Owen, A.B.
"QuasiMonte Carlo Sampling"
PDF
 BiBTeX
A Chapter on QMC for a
SIGGRAPH 2003 course. It
motivates QMC as a deterministic law of large
numbers. The algorithms are presented as
extensions of stratification methods, like
those already well known in graphics
(jittering, n rooks, multijittered sampling).
2002

Owen, A.B., Stuart, J., Mach, K. Villeneuve, A,M., Kim, S.
"A gene recommender algorithm to identify coexpressed
genes in C elegans"
Paper 
Software
This paper imitates algorithms from movie and book recommenders
to find new genes related to a group of old genes.
Given a query of genes with common function, we identify
experiments in which the query genes are strongly coexpressed.
Then we rank all the organisms genes according to the extent
to which they agree with the query group, in the selected
experiments. RNA interference knockouts confirmed two new
Retinoblastoma related genes in C elegans.

Owen, A.B.
"Variance and discrepancy with alternative scramblings"
PostScript 
PDF
There are many computationally efficient proposals for
scrambling digital nets. Generally they preserve mean
squared discrepancy. This paper shows that one
alternative can be detrimental to the sampling variance,
adversely affecting the rate of convergence.
Another scrambling improves the rate of convergence,
at least for d=1.

Owen, A.B.
"Necessity of low effective dimension"
PostScript 
PDF
This paper explores the extent to which low superposition
dimension is necessary for QMC to beat MC.

Jiang, T. and Owen, A.B.
"Quasiregression for visualization and interpretation
of black box functions"
PostScript 
PDF
Quasiregression is applied to the output of a support
vector machine and to a neural network.
The method allows one to peer into a black box and
identify important variables and interactions.
The most vexing issue is how to reconcile a decomposition derived for
independent variables with a function fit to highly dependent data.

Hickernell, F. and Lemieux, C. and Owen, A.B.
"Control variates for quasiMonte Carlo"
PostScript 
PDF
It is easy and natural to combine quasiMonte Carlo
with control variates. But the proper control
variate coefficients can change, as can the choice
of what constitutes a good control variate.
In MC a good control variate correlates with the
integrand. In QMC it is better to correlate with
some derivative or high frequency component of the
target integrand.
2001

Jiang, T. and Owen, A.B.
"Quasiregression with shrinkage"
PostScript 
PDF 
Software 
Slides
Quasiregression is a method of Monte Carlo approximation
useful for global sensitivity analysis.
This paper presents a new version,
incorporating shrinkage parameters of the type used
in wavelet approximation.
As an example application, a black box function from
machine learning is analyzed. That function is nearly
a superposition of functions of one and two variables
and the first variable acting alone accounts for more
than half of the variance.

Owen, A.B.
"The dimension distribution and quadrature test functions"
PostScript 
PDF
A "dimension distribution" is introduced through which various
measures of effective dimension of a function can be defined.
The idea is explored on some widely used quadrature test
functions. Some isotropic functions are shown to be of low
effective dimension, explaining the success of QMC methods on them.

Owen, A.B.
"Empirical Likelihood"
Book and Software home page

Lemieux, C. and Owen, A.B.
"QuasiRegression and the Relative Importance of
the ANOVA Components of a Function"
PostScript 
PDF
Every minute spent doing the research reported here was a minute
somebody was, with good reason, waiting for me to do something else. I thank them
all for their patience.
Sequoia Hall, 390 Serra Mall, Stanford CA 94305