vcfdist: accurately benchmarking phased small variant calls in human genomes-Reference-Cited by-同舟云学术

vcfdist: accurately benchmarking phased small variant calls in human genomes

Published:2023-12-09 Issue:1 Volume:14 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Dunn Tim^ORCID,Narayanasamy Satish^ORCID

Abstract

AbstractAccurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary

Link

https://www.nature.com/articles/s41467-023-43876-x.pdf

Reference47 articles.

1. Wetterstrand, K. The cost of sequencing a human genome. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost (2021).

2. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

3. NHGRI. Genetics vs. genomics fact sheet. https://www.genome.gov/about-genomics/fact-sheets/Genetics-vs-Genomics (2018).

4. Sherry, S. T. et al. dbSNP: the ncbi database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

5. Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford nanopore sequencing. Genome Biol. 20, 129 (2019).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Jointly benchmarking small and structural variant calls with vcfdist;2024-01-25