Reports

## Reports and a few talks

Most of these papers are based upon work supported by the National Science Foundation under Grants: DMS-0906056, DMS-0604939, DMS-0306612, and DMS-0072445. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Each article goes under the year when it was first written, usually as a technical report. Revisions don't usually make an article move up the list.

Slides from talks are beside some of the articles. More talks are here.

There is work on empirical likelihood, Monte Carlo & quasi-Monte Carlo, and transposable data/bioinformatics. You may have to dig down for some of those.

## Papers by year

### 2014

• Owen, A. B. A constraint on extensible quadrature rules PDF
Suppose that the best possible rate for a quadrature problem is $$O(n^{-\alpha})$$ for $$\alpha>1$$, for a simple average of function values. Suppose further that a rate optimal sequence uses sample sizes $$n_k$$. This paper extends an idea from Sobol' (1998) to give a lower bound on $$\rho = n_{k+1}/n_k$$. The bound is between 1 and 2, so it always rules out arithmetic sequences and never rules out sample size doubling.
• Basu, K. and Owen, A. B. Low discrepancy constructions in the triangle PDF
We give two explicit constructions of point sets in the triangle with vanishing discrepancy. One adapts the van der Corput sequence to the triangle. It has discrepancy at most 12/$$\sqrt{N}$$. The other scales a regular grid then rotates it through an angle with a badly approximable tangent attaining discrepancy $$O(\log(N)/N)$$. For smooth functions, randomizing the van der Corput sequence gives RMSE $$O(1/N)$$.
• Owen, A. B. and Roediger, P. A. The sign of the logistic regression coefficient PDF
This paper settles a conjecture that Paul Roediger made (with D. M. Ray and B. T. Neyer) in a comment on a paper by Jeff Wu and Y. Tang. In logistic regression on a scalar $$x$$, the MLE of the slope coefficient $$\beta$$ satisfies sign($$\hat\beta$$) = sign($$\bar x_1-\bar x_0$$), where $$\bar x_y$$ is the sample mean of x for Y=y. That this should usually be the case is intuitively obvious. That one cannot wiggle out of it by tweaking the variances, skewnesses and/or outliers in the two x groups is less obvious. One might imagine it follows from the means being sufficient statistics, but they are only conditionally sufficient. Besides it holds also for Probit models and others whose inverse link is the CDF of a log-concave density, and where there is no tiny sufficient statistic conditional or otherwise. There is a generalization to vector valued predictors.

### 2013

• Owen, A. B.
Sobol' indices and Shapley value PDF
Sobol' indices are used to measure the importance of input variables in black box functions. Shapley value is used by economists to apportion the value of a team's efforts among its individual members. This paper compares the methods. Neither kind of Sobol' index yields the Shapley value for variance explained. Compared to Shapley value, Sobol's lower index ignores interactions while Sobol's upper index overcounts them.
• Billman, D. and He, H. and Owen, A. B.
Grouping tasks and data display items via the non-negative matrix factorization PDF
Our data are a list of 119 tasks that a pilot must perform in flying a modern air liner, 210 input and output variables available to the pilot, and a matrix indicating whether a given IO variable is required for a given task. We use biclustering of this data to aid in designing an interface.
• Owen, A. B. and Dick, J. and Chen. S.
Higher order Sobol' indices original PDF | Published version: Information and Inference 2014(3)59-81
We generalize Sobol' indices from an $$L^2$$ to $$L^p$$ (integer $$p\ge2$$) setting in order to emphasize those variables that most affect extreme values of the function. Our generalizations have integral representations that allow Monte Carlo or quasi-Monte Carlo estimation.
• Chen, A., Owen, A. B. and Shi, M.
Data enriched linear regression arXiv | revised
We have a small high quality data set of (X,Y) values following a Gaussian linear regression, and a potentially much larger data set following a possibly different Gaussian linear regression. We apply Stein shrinkage to connect the two. It becomes inadmissible to just use the small data set when there are p$$\ge$$5 predictors and the error df $$\ge$$10.

### 2012

• Owen, A. B.
Self-concordance for empirical likelihood PDF
Empirical likelihood computations for the mean are typically made through a quadratic extension of logarithm to the interval $$(-\infty,1/n]$$. The quadratic extension is not self-condordant, but a quartic extension is self-concordant. Self concordant functions have Hessians that do not change too rapidly and convex optimization has strong gaurantees under self-concordance.
• Hickernell, F. J., Jiang, L., Yuewei, L. and Owen, A. B.
Guaranteed Conservative Fixed Width Confidence Intervals Via Monte Carlo Sampling PDF
We construct two stage fixed with confidence intervals for the mean. The first stage estimates variance. The second stage estimates the mean. The confidence level holds so long as the random variables have kurtosis below a known/computable bound. We use new Berry-Esseen inequalities of Nefedova and Shevtsova.
• Owen, A. B.
Quasi-regression for heritability PDF
Quasi-regression is employed to estimate missing heritability. The method assumes complete linkage equilibrium, i.e., all SNPs are independent. It then estimates the proportion of heritability by direct moment calcultions. With n subjects and d genes the mean squared error is $$O( 1/n + d^2/n^3 )$$.

• The three papers below grew out of MCQMC 2012 and the 2012 MASCOT NUM meeting.

• Owen, A. B.
Better estimation of small Sobol' sensitivity indices Revised Sept 2012
A new method for estimating Sobol' indices is proposed. © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM TOMACS, {VOL\#, ISS\#, (2012)} http://doi.acm.org/10.1145/nnnnnn.n nnnnn" The new method makes use of 3 independent input vectors rather than the usual 2. It attains much greater accuracy on problems where the target Sobol' index is small, even outperforming some oracles which adjust using the true but unknown mean of the function. When the target Sobol' index is quite large, the oracles do better than the new method. The new method attains a better rate of convergence than three others do in an asymptotic limit of effects growing small. Six other asymptotes are considered and different methods prevail in those other limits but usually the effect is in the lead constant only.
Here is a further improvement.
• Owen, A. B.
Variance components and generalized Sobol' indices PDF| slides
This paper introduces generalized Sobol' indices, compares strategies for their estimation, and makes a systematic search for efficient estimators. Of particular interest are contrasts, sums of squares and indices of bilinear form which allow a reduced number of function evaluations compared to alternatives. The bilinear framework includes some efficient estimators from Saltelli (2002) and Mauntz (2002) as well as some new estimators for specific variance components and mean dimensions. This paper also provides a bias corrected version of the estimator of Janon et al. (2012), and extends the bias correction to generalized Sobol' indices. Some numerical comparisons are given.
• Owen, A. B.
Effective dimension for weighted function spaces PDF| revised
This paper introduces notions of effective dimension for weighted Sobolev spaces. The space has low effective dimension in the truncation sense if the smallest ball of functions containing a function of variance 1 contains no functions depending materially on high index variables. It has low effective dimension in the superposition sense if that ball has no functions depending materially on higher order interactions. For product weights it is possible to explicitly compute the effective dimension of a space.

### 2011

• Ma, L, Wong, W. H. and Owen, A. B.
A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs PDF
• Owen, A. B. and Eckles, D.
Bootstrapping data arrays of arbitrary order PDF| slides
McCullagh (2000) showed that there is no bootstrap for the crossed random effects model. But resampling each factor (rows and columns) independently works well and is slightly conservative. Here we replace resampling by reweighting and then extend the result to data arrays of arbitrary order. Reweighting is better suited to large data warehouses than resampling is.

• Gleich, D. G. and Owen, A. B.
Moment based estimation of stochastic Kronecker graph parameters PDF
It is hard to estimate the parameters of Kronecker random graphs by maximum likelihood. Here is a method of moments strategy based on simple feature counts. We find that moments are more robust and give better fits than maximum likelihood.

• Sun, Y. and Zhang, N. and Owen, A. B.
Multiple hypothesis testing, adjusting for latent variables PDF| R package on CRAN| R package tar file
We introduce LEAPP (latent effect adjustment after primary projection), a method for taking account of unmeasured latent variables when doing multiple hypothesis testing. Simulations show good performance compared to alternatives, EIGENSTRAT and SVA (surrogate variable analysis). When applied to the 16 tissue AGEMAP data, LEAPP gives lists of age-related genes with much more reproducibility (over tissues) than the other methods. We prove some results on the LEAPP estimates.

### 2010

• Owen, A. B.
Moment based estimation of stochastic Kronecker graph parameters Deprecated. See above article with David Gleich.
It is hard to estimate the parameters of Kronecker random graphs by maximum likelihood. Here is a method of moments strategy based on simple feature counts. For large data sets the error is dominated by model lack of fit and so the extra efficiency of likelihood over moments is less important than the speed advantage of moments.

• Chen, S. and Matsumoto, M. and Nishimura, T. and Owen, A. B.
New inputs and methods for Markov chain quasi-Monte Carlo PDF
We present some new ways of generating small CUD sequences. We introduce some embedded antithetic and round trip variance reductions into MCMC and prove that they preserve the CUD property. In some simulations of GARCH and stochastic volatility models, the new methods greatly outperform standard IID sampling. The original publication is available at www.springerlink.com (or will be so, once it has been completed).

• Dyer, J. and Owen, A.B.
Visualizing bivariate long tailed data revised PDF| slides from NIPS
Suppose that we observe two or more categorical variables with a long tail, such as movies and customers in ratings data. This paper looks at a way to visualize the joint distribution of such data. We use a copula plot based on the observed ranks of the data. We prove under a generative model that the observed ranks are asymptotically close to some underlying true ranks under a Zipf-Mandelbrot-Poisson assumption. Some ratings data show a strong head to tail affinity: busy raters are over represented at rarely rated items and conversely. We present two simple generative models that produce such an effect. One is a saturation model and the other is bipartite preferential attachment. We prove bounds on the marginal distributions for these models.

• Dyer, J. and Owen, A.B.
Correct ordering in the Zipf-Poisson ensemble PDF Revised Jan 2012 PDF
Counted rank data arise commonly, such as most popular baby names, English words and web sites. This paper analyzes the reliability of such ordering. We use a model where $$X_i$$ is Poisson with mean following a Zipf law. We get estimates for the number n of leading items $$i=1\dots n$$ correctly ordered by their observed counts $$X_i$$. If grows at the rate $$(AN/\log(N))^{1/(\alpha+2)}$$ where $$\alpha$$ is the Zipf parameter and $$A = \alpha^2(\alpha+2)/4$$. For a Zipf-Poisson model of the British National Corpus of 100,000,000 words, we estimate that the 72 most frequent words are in their correct order.

• She, Y. and Owen, A.B.
Outlier detection using nonconvex penalized regressions PDF (orig) | PDF (revised)
We put in a dummy variable for all n observations in a regression but regularize their coefficients via a thresholding rule. The result is robust regression that empirically is very good at identifying outliers. A key step is a clean model for which the outliers become the signal and in which BIC is applicable.

### 2009

• L'Ecuyer, P. and Owen, A.B. (eds)
Monte Carlo and Quasi-Monte Carlo Methods 2008
Proceedings of MCQMC 2008, July 6-11 2008, Montreal Canada.
Springer-Verlag. ISBN 978-3-642-04106-8 List of articles

• S.C. Emerson and Owen, A.B.
Calibration of the empirical likelihood method for a vector mean Electronic Journal of Statistics
This paper presents an approach for getting outside of the 'convex hull problem' in empirical likelihood.

• Owen, A.B.
Recycling physical random variables EJS
This paper shows how to get n(n-1)/2 pairwise independent random vectors out of just n fully independent ones. Similar constructions are widely used. What is new here is a statistical analysis of the consequences for Monte Carlo sampling: the resulting means are degenerate U-statistics with non-normal limits. Quite surprisingly, their asymptotic distributions come out symmetric, based on recent results on the spectrum of circulant matrices.

• Southworth, L.K., Owen, A.B. and Kim, S.K.
Aging mice show a decreasing correlation of gene expression within genetic modules. PLoS Genetics PDF
As mice age, the correlations among sets of related genes grow weaker.

• Chen, S., Dick, J. and Owen, A.B.
Consistency of Markov chain quasi-Monte Carlo on continuous state spaces PDF
Tribble has made over 1000-fold efficiency improvements by inserting QMC sampling into MCMC problems. The earlier consistency results for use of QMC in MCMC only worked for discrete state spaces. This paper extends them to continuous problems like the ones in Tribble's thesis

• Xu, Y., Dyer, J.S. and Owen, A.B.
Empirical stationary correlations for semi-supervised learning on graphs PDF| Talk
Many methods for semi-supervised learning on graphs turn out to be forms of kriging. They use a correlation structure derived from the graph, but without taking account of correlations among the observed response values. We incorporate the empirical correlations into the covariance. In two example data sets we find greatly improved prediction.

### 2008

• Owen, A.B.
Monte Carlo and Quasi-Monte Carlo for Statistics PDF
This is a survey which sketches some topics in statistics that use Monte Carlo and quasi-Monte Carlo methods. The emphasis is on problems with some open research issues.

• Owen, A.B.
Karl Pearson's meta-analysis revisited
Annals of Statistics | Slides | Supplementary figures (web) | Supplementary figures (pdf)
A test of Karl Pearson, thought to be inadmissible for over 50 years, is shown to be admissible. Furthermore it has good power against alternatives in which all or most of the non-zero parameters share the same sign. An earlier paper below, used a big Monte Carlo simulation, where this one uses an FFT to get exact power. This one also compares to standard tests not ordinarily thought of as meta-analysis.

• Southworth, L.K., Kim. S.K. and Owen, A.B.
Properties of balanced permutations
Journal of Computational Biology, April 2009, 16(4): 625-638
Balanced permutations are a fascinating idea for microarray analyses. But we find that they can give very misleading p values.

### 2007

• Perry, P.O. and Owen, A.B.
A rotation test to verify latent structure JMLR Feb 2010
We test the presence of a latent variable in correlated noise by employing projection pursuit non-normality measures.

• Zahn, Poosala, Owen, Ingram, Lustig, Carter, Weeratna, Taub, Gorospe, Mazan-Mamczarz, Lakatta, Boheler, Zu, Mattson, Falco, Ko, Schlessinger, Firman, Kummerfeld, Wood, Longo, Zonderman, Kim, Becker
"AGEMAP: a gene expression database for aging in mice" PLOS Genetics
This is a preliminary online version of the article. There may be changes.
We look at patterns of aging in mice for 16 different tissues, as measured by gene expression.

• Owen, A.B. and Perry, P.O.
Bi-cross-validation of the SVD and the non-negative matrix factorization final PDF from AOAS | JSM 2009 slides | | older slides
We look at how to pick the rank k when approximating a matrix by a truncated SVD. We hold out a rectangular submatrix, fit an SVD to a complementary submatrix, truncate it and predict. The method extends to the non-negative matrix factorization among other models.

• Owen, A.B. "Pearson's test in a large scale multiple meta-analysis" PDF
The AGEMAP study generated an 8932 x 16 matrix of p values. We apply meta-analysis to each row. A method originally proposed by Pearson (1934) and thought for over 50 years to be inadmissible performs best in a simulation. We also show that Pearson's test really is admissible.

• Owen, A.B. "The pigeonhole bootstrap" Annals of Applied Statistics | PDF | Slides
McCullagh (2000) showed that large crossed random effects data sets, such as are now studied for recommender engines and information retrieval are impossible to bootstrap. This means that even for balanced homoscedastic random effects models with no missing data, no bootstrap correctly estimates the variance of a sample mean (let alone a more complicated procedure). But one of the methods he studied, that of independently resampling rows and columns, comes pretty close. This article shows the expected bootstrap variance in that method tracks the desired variance, even for severely unbalanced and heteroscedastic data sets.

### 2006

• Owen, A.B. "A robust hybrid of lasso and ridge regression" PDF | Slides
A penalty that behaves like lasso for small coefficients and like ridge for large coefficients is developed. This penalty is a reversed Huber function. The penalty is convex. Like the Huber function it requires scaling. The scaling parameter can be incorporated into a criterion that is jointly convex in it and the regression coefficient vector.

• Owen, A.B. "Infinitely imbalanced logistic regression" JMLR
Many binary classfication problems are very unbalanced with one category much more common than the other. This paper shows what happens to logistic regression in the limit where one category's sample size tends to infinity while the other remains finite. For example one could use logistic regression to separate a Gaussian measure from a finite data set. Under mild conditions, the limiting coefficient vector (apart from the intercept) is finite. It can be expressed in terms of an exponential tilt and solved for by a convex optimization.
Journal of Machine Learning Research (v 8, pp 761-773, 2007)

• Zahn, Sonu, Vogel, Crane, Mazan-Mamczarz, Rabkin, Davis, Becker, Owen, Kim
"Transcriptional profile of aging in human muscle reveals a common aging signature" PLOS Genetics
This paper relates human aging to various genes and gene groups. It includes a new version of Gene Set Enrichment Analysis (GSEA) geared to handle regressions and covariates. The electron transport group of genes are found to be age related in human kidney, muscle and brain, and in other species.

### 2005

• Tribble, S.D and Owen, A.B.
"Constructions of weakly CUD sequences for MCMC Electronic Journal of Statistics
(2008) volume 2, pages 634-660
An earlier paper below showed that MCMC can be driven by completely uniformly distributed (CUD) or weakly CUD (WCUD) point sequences. This paper shows that a construction of Liao's that permutes QMC vectors leads to WCUD point sequences. A theorem of Niederreiter (1977) implies that certain lattice constructions satisfy a triangular array version of CUD. We find QMC methods for MCMC reduce variance by factors ranging from 10 to several hundred in a 42 dimensional Gibbs sampling probit example. A proposal of Liao's for incorporating acceptance rejection into QMC-MCMC

• Owen, A.B.
"Local antithetic sampling with scrambled nets" Annals of Statistics 2008 | @ arXiv | @ Project Euclid
A local antithetic reflection strategy can improve the variance rate of scrambled nets from $$n^{-3/2+\epsilon}$$ to $$n^{-3/2-2/d+\epsilon}$$ in dimension d. The benefit is similar to that which Haber gets when moving from stratified sampling to stratified antithetic sampling. The method also looks like a merger of scrambled nets and monomial cubature.

• Owen, A.B.
"On the Warnock-Halton quasi-standard error" Monte Carlo Methods and Applications
v12 n 1 pp 47--54 DOI: 10.1515/156939606776886652
Warnock and Halton have proposed a method of treating multiple QMC estimates as replicates. This paper reproduces an example where the method seems to work, but cautions that the method can fail arbitrarily badly.

• Lin, Z & Owen, A.B. & Altman, R. Science Reply to letters about "Genetic research and human subject privacy".

### 2004

• Owen, A.B. and Tribble, S.D
"A quasi-Monte Carlo Metropolis algorithm" pnas
This paper proves that QMC methods can be applied to Metropolis-Hastings style MCMC. The key idea is to use QMC points that are "completely uniformly distributed". This is like using the entire period of a small random number generator.

• Rodwell,Sonu,Zahn,Lund,Wilhelmy,Wang,Xiao,Mindrinos,Crane,Segal,Myers,Brooks,Davis,Higgins,Owen, and Kim
"A transcriptional profile of aging in the human kidney" Public Library of Science
We found genes that change expression with age, in the human kidney. These genes do not tend to be the ones that serve to distinguish kidney from other tissue types, consistent with a model that aging is similar in different tissue types. The genes do not overlap with aging related genes in other species that we looked at.

• Owen, A.B.
"Randomized QMC and point singularities" PDF
Randomized QMC is shown to have a superior rate of convergence to ordinary MC on some functions with square integrable singularities at unknown locations. Surprisingly that means RQMC will generally beat importance sampling asymptotically. Of course one might combine them.

• Lin, Z & Owen, A.B. & Altman, R.
"Genetic research and human subject privacy" Science, Vol 305, Issue 5681, 183, 9 July 2004

• Owen, A. B.
"Variance of the number of false discoveries". PDF | R functions (beta) | R function documentation (beta)
Given d hypothesis tests at level $$\alpha$$ we expect $$d \alpha$$ false positives. When the tests are dependent, the variance of the number of false positives can be $$O(d^2)$$ higher than the independence value of $$d \alpha(1-\alpha)$$. This paper shows how to estimate such a variance taking account of dependency in the tests. The R functions are beta and very subject to change. Feedback is welcomed.

• Owen, A. B. "Halton sequences avoid the origin". SIAM Review v 48 n 3 pp 487--503
Gives rates of convergence for QMC on unbounded integrands, using growth conditions on f and a singularity avoidance pattern for x's. Halton sequences and randomized QMC avoid the origin suitably.

• Owen, A. B. "Multidimensional variation for quasi-Monte Carlo". PDF | BiBTeX

Survey, and some new results, on multidimensional variation (Vitali and Hardy-Krause) for Quasi-Monte Carlo.

### 2003

• Delayed graft function in transplanted kidneys Article

• Be sure to see Tao Jiang's Thesis: Compressed PostScript | PDF

• Liu, R. Owen, A.B. "Estimating Mean Dimensionality of ANOVA decompositions (revised)" JASA 2006 Relationships between Sobol's sensitivity indices and moments of the dimension distribution are established. The mean dimension is computed for some functions arising in finance and extreme value theory. The minimum of d independent uniform random variables is seen to have strong low dimensional components.

• Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D.
"A Bayesian framework for combining heterogeneous data sources for gene function prediction (in S. cerevisiae)" Online Supplement

• Owen, A.B. "Quasi-Monte Carlo Sampling" PDF |BiBTeX
A Chapter on QMC for a SIGGRAPH 2003 course. It motivates QMC as a deterministic law of large numbers. The algorithms are presented as extensions of stratification methods, like those already well known in graphics (jittering, n rooks, multi-jittered sampling).

### 2002

• Owen, A.B., Stuart, J., Mach, K. Villeneuve, A,M., Kim, S.
"A gene recommender algorithm to identify co-expressed genes in C elegans" Paper| Software

This paper imitates algorithms from movie and book recommenders to find new genes related to a group of old genes. Given a query of genes with common function, we identify experiments in which the query genes are strongly co-expressed. Then we rank all the organisms genes according to the extent to which they agree with the query group, in the selected experiments. RNA interference knockouts confirmed two new Retinoblastoma related genes in C elegans.

• Owen, A.B. "Variance and discrepancy with alternative scramblings" PostScript | PDF

There are many computationally efficient proposals for scrambling digital nets. Generally they preserve mean squared discrepancy. This paper shows that one alternative can be detrimental to the sampling variance, adversely affecting the rate of convergence. Another scrambling improves the rate of convergence, at least for d=1.

• Owen, A.B. "Necessity of low effective dimension" PostScript | PDF

This paper explores the extent to which low superposition dimension is necessary for QMC to beat MC.

• Jiang, T. and Owen, A.B. "Quasi-regression for visualization and interpretation of black box functions" PostScript | PDF

Quasi-regression is applied to the output of a support vector machine and to a neural network. The method allows one to peer into a black box and identify important variables and interactions. The most vexing issue is how to reconcile a decomposition derived for independent variables with a function fit to highly dependent data.

• Hickernell, F. and Lemieux, C. and Owen, A.B. "Control variates for quasi-Monte Carlo" PostScript | PDF

It is easy and natural to combine quasi-Monte Carlo with control variates. But the proper control variate coefficients can change, as can the choice of what constitutes a good control variate. In MC a good control variate correlates with the integrand. In QMC it is better to correlate with some derivative or high frequency component of the target integrand.

### 2001

• Jiang, T. and Owen, A.B. "Quasi-regression with shrinkage" PostScript | PDF | Software | Slides

Quasi-regression is a method of Monte Carlo approximation useful for global sensitivity analysis. This paper presents a new version, incorporating shrinkage parameters of the type used in wavelet approximation. As an example application, a black box function from machine learning is analyzed. That function is nearly a superposition of functions of one and two variables and the first variable acting alone accounts for more than half of the variance.

• Owen, A.B. "The dimension distribution and quadrature test functions" PostScript | PDF

A "dimension distribution" is introduced through which various measures of effective dimension of a function can be defined. The idea is explored on some widely used quadrature test functions. Some isotropic functions are shown to be of low effective dimension, explaining the success of QMC methods on them.

@article{dimdist, author = {A. B. Owen}, title = {The dimension distribution and quadrature test functions}, journal = {Statistica Sinica}, volume = 13, number = 1, note = {In press}, year = 2003 }

• Owen, A.B. "Empirical Likelihood" Book | Software

• Lemieux, C. and Owen, A.B. "Quasi-Regression and the Relative Importance of the ANOVA Components of a Function" PostScript | PDF

### 1999

• Owen, A.B. and Zhou, Y. "Adaptive Importance Sampling by Mixtures of Products of Beta Distributions" PostScript | PDF

• An, J. and Owen, A.B. "Quasi-Regression", Computer experiments are used commonly in engineering design problems, arising in semiconductor, aerospace and other fields, with accurate deterministic simulators. The purposes of computer experiments vary...they might be thought of as "function mining" in analogy to data mining. The goals include approximation, interpretation and visualization. Quasi-regression is a frequentist simulation based tool for computer experiments. It is aimed at building approximations for those problems for which kriging, (often called DACE) is computationally infeasible. PostScript | PDF

• Owen, A.B. "Assessing linearity in high dimensions" Revised. PostScript | PDF

• Owen, A.B. and Zhou, Y. "Safe and effective importance sampling", revised. PostScript | PDF

• Owen, A.B. "Tubular Neighbors for Regression and Classification" PostScript | PDF

### 1998

• Owen, A.B. and Zhou, Y. "Safe and effective importance sampling" Importance sampling can be nearly perfect with a well chosen sampling density. Or it can give an infinite variance even when the sampling density closely matches the integrand. This paper shows that by combining importance sampling and control variates it is possible to get a method that is never much worse than importance sampling and never much worse than ordinary monte carlo. It is also well known that the importance sampling variance can approach zero for a nonnegative integrand as the sampling density becomes more nearly proportional to the integrand. This paper shows that it is possible to approach zero variance for integrands taking both positive and negative signs, by using two or more importance samples. PostScript | PDF

• Owen, A.B. "Monte Carlo Extension of Quasi-Monte Carlo" Monte Carlo methods can be used to advantage in combination with quasi-Monte Carlo. This paper surveys some recent results: randomized QMC can give sample based error estimates, randomized QMC can also increase accuracy by a factor of roughly $$n^{-1/2}$$ over QMC for smooth integrands, and Latin supercube sampling can extend the reach of QMC into higher, even infinite, dimensions. PostScript | PDF To appear: 1998 Winter Simulation Conference Proceedings. (D. J. Medieiros, E.F. Watson, M. Manivannan, and J. Carson, Eds.), pages 571--577. I thank WSC'98 for allowing its republication.

### 1997

• Owen, A.B. "Scrambling Sobol and Niederreiter-Xing Points". PostScript

• Owen, A.B. "How nearly linear is a function?". Many high dimensional numerical problems become tractable for nearly linear functions. The problem is, how can you tell if a given function is nearly linear? This paper presents a method based on quasi-regression for estimating the amount of linear structure in a function. Quasi-regression allows one to estimate regression coefficients without matrix inversion, at least in controlled settings, in the same way that quasi-interpolation allows fast approximate interpolation. In an example, the amount of linear structure in a 1,000,000 dimensional function is accurately estimated from only 90,000 observations. A more thorough experimental investigation finds a variant of quasi-regression that gives consistently good results on 64 functions (defined by varying 6 binary features) over 1,000 dimensions. PostScript

• Owen, A.B. "Latin supercube sampling for very high dimensional simulations". In very high dimensional simulations quasi-Monte Carlo (QMC) methods can begin to lose their effectiveness or break down completely. One response is to use QMC on a few key dimensions and something else on the other dimensions. Latin supercube sampling allows one to use QMC on more than one independent set of dimensions. The paper also considers Latin hypercube sampling for infinite dimensional problems. Motivating applications: particle transport simulations, computational finance, graphical rendering, queuing systems. PostScript

• Caflisch, R.E., Morokoff, W. and Owen, A.B. "Valuation of Mortgage Backed Securities using Brownian Bridges to Reduce Effective Dimension". We study a 360 dimensional integration problem motivated by collateralized mortgage obligations. The integral is tractable because it is of lower effective dimension (as defined in the paper) than 360. A Brownian bridge encoding of Brownian motion reduces the effective dimension further. As a result quasi-Monte Carlo sampling of the leading input dimensions is made more effective. PostScript

Every minute spent doing the research reported here was a minute somebody was, with good reason, waiting for me to do something else. I thank them all for their patience.
Sequoia Hall, 390 Serra Mall, Stanford CA 94305
Bivariate Zipf