SVPath: an accurate pipeline for predicting the pathogenicity of human exon structural variants

Author:

Yang Yaning1,Wang Xiaoqi1,Zhou Deshan1,Wei Dong-Qing2,Peng Shaoliang134

Affiliation:

1. College of Computer Science and Electronic Engineering , Hunan University, Changsha, China

2. State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology , Shanghai Jiao Tong University, Shanghai, China

3. School of Computer Science , National University of Defense Technology, Changsha, China

4. Peng Cheng Lab , Shenzhen, China

Abstract

Abstract Although there are a large number of structural variations in the chromosomes of each individual, there is a lack of more accurate methods for identifying clinical pathogenic variants. Here, we proposed SVPath, a machine learning-based method to predict the pathogenicity of deletions, insertions and duplications structural variations that occur in exons. We constructed three types of annotation features for each structural variation event in the ClinVar database. First, we treated complex structural variations as multiple consecutive single nucleotide polymorphisms events, and annotated them with correlation scores based on single nucleic acid substitutions, such as the impact on protein function. Second, we determined which genes the variation occurred in, and constructed gene-based annotation features for each structural variation. Third, we also calculated related features based on the transcriptome, such as histone signal, the overlap ratio of variation and genomic element definitions, etc. Finally, we employed a gradient boosting decision tree machine learning method, and used the deletions, insertions and duplications in the ClinVar database to train a structural variation pathogenicity prediction model SVPath. These structural variations are clearly indicated as pathogenic or benign. Experimental results show that our SVPath has achieved excellent predictive performance and outperforms existing state-of-the-art tools. SVPath is very promising in evaluating the clinical pathogenicity of structural variants. SVPath can be used in clinical research to predict the clinical significance of unknown pathogenicity and new structural variation, so as to explore the relationship between diseases and structural variations in a computational way.

Funder

National Key R&D Program of China

NSFC

National Science Foundation

Changsha Municipal Science and Technology Bureau

Guangdong Provincial Department of Education

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference57 articles.

1. Genome structural variation discovery and genotyping;Alkan;Nat Rev Genet,2011

2. A map of human genome variation from population scale sequencing;1000 Genomes Project Consortium;Nature,2010

3. An integrated map of structural variation in 2,504 human genomes;Sudmant;Nature,2015

4. Analysis of protein-coding genetic variation in 60,706 humans;Lek;Nature,2016

5. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals;Natarajan;Nat Commun,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3