Uncovering complementary sets of variants for predicting quantitative phenotypes-Reference-Cited by-同舟云学术

Uncovering complementary sets of variants for predicting quantitative phenotypes

Published:2020-12-12 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yılmaz Serhan^ORCID,Fakhouri Mohamad^ORCID,Koyutürk Mehmet^ORCID,Çiçek A. Ercüment^ORCID,Taştan Öznur^ORCID

Abstract

AbstractMotivationGenome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning.ResultsWe propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least 2 orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ~107 variants in a matter of minutes while taking the dependencies between the variants into account.ConclusionMacarons can offer a reasonable trade-off between phenotype predictivity, runtime and the complementarity of the selected subsets. The framework we present can be generalized to other high-dimensional feature selection problems within and beyond biomedical applications.AvailabilityMacarons is implemented in Matlab and the source code is available at: https://github.com/serhan-yilmaz/macarons

Publisher

Cold Spring Harbor Laboratory

Reference38 articles.

1. Patterns of linkage disequilibrium in the human genome

2. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines

3. Efficient network-guided multi-locus association mapping with graph cuts

4. Caylak, G. et al. (2020). Potpourri: An epistasis test prioritization algorithm via diverse snp selection. Journal of Computational Biology.

5. Detecting gene–gene interactions that underlie human diseases