NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling-Reference-Cited by-同舟云学术

NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling

Published:2024-09-12 Issue: Volume:12 Page:1125
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Hanssen Friederike,Gabernet Gisela,Bäuerle Famke,Stöcker Bianca,Wiegand Felix,Smith Nicholas H.^ORCID,Mertes Christian,Neogi Avirup Guha,Brandhoff Leon,Ossowski Anna,Altmueller Janine,Becker Kerstin,Petzold Andreas,Sturm Marc,Stöcker Tyll,Sivalingam Sugirthan,Brand Fabian,Schmidt Axel,Buness Andreas,Probst Alexander J.,Motameny Susanne^ORCID,Köster Johannes^ORCID

Abstract

We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing.

Funder

Deutsche Forschungsgemeinschaft

Publisher

F1000 Research Ltd

Link

https://f1000research.com/articles/12-1125/v2/pdf

Reference25 articles.

1. Integrating human sequence data sets provides a resource of benchmark snp and indel genotype calls.;J Zook;Nat. Biotechnol.,Mar 2014

2. Ying Sheng, Karoline Bjarnesdatter Rypdal, and Marc Salit. Extensive sequencing of seven human genomes to characterize benchmark reference materials.;J Zook;Sci. Data.,Jun 2016

3. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.;M Eberle;Genome Res.,Jan 2017

4. A synthetic-diploid benchmark for accurate variant-calling evaluation.;H Li;Nat. Methods.,Aug 2018

5. Sequencing benchmarked.;J Wendell