Fast and powerful conditional randomization testing via distillation-Reference-Cited by-同舟云学术

Fast and powerful conditional randomization testing via distillation

Published:2021-07-08 Issue: Volume: Page:
ISSN:0006-3444
Container-title:Biometrika
language:en
Short-container-title:

Author:

Liu Molei¹,Katsevich Eugene²,Janson Lucas³,Ramdas Aaditya⁴

Affiliation:

1. Department of Biostatistics, Harvard Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A

2. Department of Statistics and Data Science, Wharton School of the University of Pennsylvania, 265 South 37th Street, Philadelphia, Pennsylvania 19104, U.S.A

3. Department of Statistics, Harvard University, One Oxford Street, Cambridge, Massachusetts 02138, U.S.A

4. Department of Statistics & Data Science, Carnegie Mellon University, 132H Baker Hall, Pittsburgh, Pennsylvania 15213, U.S.A

Abstract

Summary We consider the problem of conditional independence testing: given a response $Y$ and covariates $(X,Z)$, we test the null hypothesis that $Y {\perp\!\!\!\perp} X \mid Z$. The conditional randomization test was recently proposed as a way to use distributional information about $X\mid Z$ to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about $Y\mid (X,Z)$. This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test’s statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Statistics, Probability and Uncertainty,General Agricultural and Biological Sciences,Agricultural and Biological Sciences (miscellaneous),General Mathematics,Statistics and Probability

Link

https://academic.oup.com/biomet/advance-article-pdf/doi/10.1093/biomet/asab039/41820663/asab039.pdf

Reference40 articles.

1. Controlling the false discovery rate via knockoffs;Barber,;Ann. Statist.,2015

2. Causal inference in genetic trio studies;Bates,;Proc. Nat. Acad. Sci.,2020

3. Conditional independence testing using generative adversarial networks;Bellot,;Proc. Adv. Neural Inf. Proc. Syst.,2019

4. Controlling the false discovery rate: A practical and powerful approach to multiple testing;Benjamini,;J. R. Statist. Soc. B,1995

5. The control of the false discovery rate in multiple testing under dependency;Benjamini,;Ann. Statist.,2001

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Correlation adjusted debiased Lasso: debiasing the Lasso with inaccurate covariate model;Journal of the Royal Statistical Society Series B: Statistical Methodology;2024-06-15

2. Controlled Discovery and Localization of Signals via Bayesian Linear Programming;Journal of the American Statistical Association;2024-06-11

3. Compositional Differential Abundance Testing: Defining and Finding a New Type of Health-Microbiome Associations;2024-06-06

4. Reconciling model-X and doubly robust approaches to conditional independence testing;The Annals of Statistics;2024-06-01

5. Characterization of Post–COVID-19 Definitions and Clinical Coding Practices: Longitudinal Study;Online Journal of Public Health Informatics;2024-05-03