Affiliation:
1. Genome Competence Center, Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
Abstract
The identification of genomic variants has become a routine task in the age of genome sequencing. In particular, small genomic variants of a single or few nucleotides are routinely investigated for their impact on an organism’s phenotype. Hence, the precise and robust detection of the variants’ exact genomic locations and changes in nucleotide composition is vital in many biological applications. Although a plethora of methods exist for the many key steps of variant detection, thoroughly testing the detection process and evaluating its results is still a cumbersome procedure. In this work, we present a collection of easy-to-apply and highly modifiable workflows to facilitate the generation of synthetic test data, as well as to evaluate the accordance of a user-provided set of variants with the test data. The workflows are implemented in Nextflow and are open-source and freely available on Github under the GPL-3.0 license.
Funder
European Union’s EU4Health program
German Federal Ministry of Health (IMS-RKI and IMS-NRZ/KL projects) on the basis of a resolution of the German Bundestag
Reference20 articles.
1. SNP alleles in human disease and evolution;Shastry;J. Hum. Genet.,2002
2. Animal-SNPAtlas: A comprehensive SNP database for multiple animals;Gao;Nucleic Acid Res.,2023
3. Poplin, R., Ruano-Rubio, V., DePristo, M.A., Fennell, T.J., Carneiro, M.O., Van der Auwera, G.A., Kling, D.E., Gauthier, L.D., Levy-Moonshine, A., and Roazen, D. (2017). Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv.
4. Majidian, S., Agustinho, D.P., Chin, C.S., Sedlazeck, F.J., and Mahmoud, M. (2023). Genomic variant benchmark: If you cannot measure it, you cannot improve it. Genome Biol., 24.
5. Holtgrewe, M. (2010). Mason—A Read Simulator for Second Generation Sequencing Data, Freie Universität Berlin. Technical Report FU Berlin.