Author:
Yilmaz Serhan,Tastan Oznur,Cicek A. Ercument
Abstract
AbstractPhenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants. Algorithms that select SNPs which are close and connected on a biological network have been successful in finding biologically-interpretable and predictive loci. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that selects SNPs that cover diverse regions in the underlying SNP-SNP network. SPADIS favors the selection of remotely located SNPs in order to account for the complementary additive effects of SNPs that are associated with the phenotype. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor (1−1/e) approximation. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana genotype and continuous flowering time phenotypes. SPADIS has better regression performance in 12 out of 17 phenotypes on average, it identifies more candidate genes and runs faster. We also investigate the use of Hi-C data to construct SNP-SNP network in the context of SNP selection problem for the first time, which yields slight but consistent improvements in regression performance. SPADIS is available at http://ciceklab.cs.bilkent.edu.tr/spadis
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献