Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines-Reference-Cited by-同舟云学术

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Published:2015-08-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cleary John G.,Braithwaite Ross,Gaastra Kurt,Hilbush Brian S,Inglis Stuart,Irvine Sean A,Jackson Alan,Littin Richard,Rathod Mehul,Ware David,Zook Justin M.,Trigg Len,De La Vega Francisco M.^ORCID

Abstract

To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a ?gold standard? need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs.

Publisher

Cold Spring Harbor Laboratory

Cited by 162 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Saliva-derived DNA is suitable for the detection of clonal haematopoiesis of indeterminate potential;Scientific Reports;2024-08-14

2. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline;Frontiers in Genetics;2024-07-26

3. A Novel Approach for Accurate Sequence Assembly Using de Bruijn graphs;2024-06-02

4. Analysis and benchmarking of small and large genomic variants across tandem repeats;Nature Biotechnology;2024-04-26

5. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain;2024-04-20