Abstract
Abstract
Background
Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ($$p \gg n$$
p
≫
n
). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects.
Results
We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case–control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis.
Conclusions
In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.
Funder
Fonds Wetenschappelijk Onderzoek
Advanced machine learning for Innovative Drug Discovery
Publisher
Springer Science and Business Media LLC
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献