Abstract
ABSTRACTA conceptual paradigm for onset of a new disease is often considered to be the result of changes in entire biological networks whose states are affected by a complex interaction of genetic and environmental factors. However, when modelling a relevant phenotype as a function of high dimensional measurements, power to estimate interactions is low, the number of possible interactions could be enormous and their effects may be non-linear. In this work, we introduce a method called sail for detecting non-linear interactions with a key environmental or exposure variable in high-dimensional settings which respects the strong or weak heredity constraints. We prove that asymptotically, our method possesses the oracle property, i.e., it performs as well as if the true model were known in advance. We develop a computationally efficient fitting algorithm with automatic tuning parameter selection, which scales to high-dimensional datasets. Through an extensive simulation study, we show that sail outperforms existing penalized regression methods in terms of prediction accuracy and support recovery when there are non-linear interactions with an exposure variable. We apply sail to detect non-linear interactions between genes and a prenatal psychosocial intervention program on cognitive performance in children at 4 years of age. Results show that individuals who are genetically predisposed to lower educational attainment are those who stand to benefit the most from the intervention. Our algorithms are implemented in an R package available on CRAN (https://cran.r-project.org/package=sail).
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. Structured sparsity through convex optimization;Statistical Science,2012
2. An analytic approach for interpretable predictive models in high-dimensional data in the presence of interactions with exposures;Genetic epidemiology,2018
3. A lasso for hierarchical interactions;The Annals of Statistics,2013
4. Bühlmann, P. , Van De Geer, S. , 2011. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
5. Buuren, S.v. , Groothuis-Oudshoorn, K. , 2010. mice: Multivariate imputation by chained equations in r. Journal of statistical software, 1–68.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献