PEcnv: accurate and efficient detection of copy number variations of various lengths

Author:

Wang Xuwen12,Xu Ying12,Liu Ruoyu12,Lai Xin12,Liu Yuqian12ORCID,Wang Shenjie12,Zhang Xuanping12,Wang Jiayin12ORCID

Affiliation:

1. Department of Computer Science and Technology, School of Electronics and Information Engineering, Xi’an Jiaotong University , Xi’an 710049, China

2. Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University , Xi’an 710049, China

Abstract

Abstract Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv

Funder

Shaanxi’s Natural Science Basic Research Program

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference37 articles.

1. Mechanisms for recurrent and complex human genomic rearrangements;Liu;Curr Opin Genet Dev,2012

2. Using XHMM software to detect copy number variation in whole-exome sequencing data;Fromer;Curr Protoc Hum Genet,2014

3. Copy number variation: new insights in genome diversity;Freeman;Genome Res,2006

4. Chromosome aberrations in solid tumors;Albertson;Recent Results Cancer Res,2003

5. Global variation in copy number in the human genome;Redon;Nature,2006

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3