Predicting target genes of non-coding regulatory variants with IRT-Reference-Cited by-同舟云学术

Predicting target genes of non-coding regulatory variants with IRT

Published:2020-06-24 Issue:16 Volume:36 Page:4440-4448
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Wu Zhenqin¹²^ORCID,Ioannidis Nilah M²,Zou James²³

Affiliation:

1. Department of Chemistry, Stanford University, CA 94305, USA

2. Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA

3. Chan-Zuckerberg Biohub, San Francisco, 94158 CA, USA

Abstract

Abstract Summary Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. Availability and implementation Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Science Foundation CCF

National Institutes of Health

Silicon Valley Foundation and the Chan-Zuckerberg Initiative

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa254/33698897/btaa254.pdf

Reference47 articles.

1. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks;Agarwal,2018

2. Permutation importance: a corrected feature importance measure;Altmann;Bioinformatics,2010

3. Identification of susceptibility loci for cutaneous squamous cell carcinoma;Asgari;J. Invest. Dermatol,2016

4. High-resolution profiling of histone methylations in the human genome;Barski;Cell,2007

5. Interactions between HERC2, OCA2 and MC1R may influence human pigmentation phenotype;Branicki;Ann. Hum. Genet,2009

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Is the Area Under Curve Appropriate for Evaluating the Fit of Psychometric Models?;Educational and Psychological Measurement;2022-05-24

2. GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data;Nucleic Acids Research;2022-03-02

3. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions;Briefings in Bioinformatics;2022-01-22

4. Non‐coding regulatory elements: Potential roles in disease and the case of epilepsy;Neuropathology and Applied Neurobiology;2021-12-16

5. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions;2021-09-06