Abstract
SummaryTranscriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene’s expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.
Funder
National Institutes of Health
Publisher
Oxford University Press (OUP)
Subject
Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability
Reference28 articles.
1. TensorFlow: large-scale machine learning on heterogeneous systems;Abadi,;12th USENIX symposium on operating systems design and implementation (OSDI 16),2015
2. Double/debiased machine learning for treatment and structural parameters;Chernozhukov,;The Econometrics Journal,2018
3. Multiancestry genome-wide association study of lipid levels incorporating gene-alcohol interactions;De Vries,;American Journal of Epidemiology,2019
4. Model checking via testing for direct effects in Mendelian randomization and transcriptome-wide association studies;Deng,;PLoS Computational Biology,2021
5. A gene-based association method for mapping traits using reference transcriptome data;Gamazon,;Nature Genetics,2015
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献