Affiliation:
1. National Genomics Data Center, China National Center for Bioinformation , Beichen West Road, Chaoyang District, Beijing 100101, China
2. Beijing Institute of Genomics, Chinese Academy of Sciences , Beichen West Road, Chaoyang District, Beijing 100101, China
Abstract
Abstract
Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.
Funder
Strategic Priority Research Program of the Chinese Academy of Sciences
National Natural Science Foundation of China
Shanghai Municipal Science and Technology Major Project
Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence
ZJLab
Publisher
Oxford University Press (OUP)