Performance evaluation of six popular short-read simulators-Reference-Cited by-同舟云学术

Performance evaluation of six popular short-read simulators

Published:2022-12-10 Issue:2 Volume:130 Page:55-63
ISSN:0018-067X
Container-title:Heredity
language:en
Short-container-title:Heredity

Author:

Milhaven Mark,Pfeifer Susanne P.^ORCID

Abstract

AbstractHigh-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas “gold-standard” empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design—yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators—ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim—and discuss important considerations for selecting suitable models for benchmarking.

Funder

National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Genetics (clinical),Genetics

Link

https://www.nature.com/articles/s41437-022-00577-3.pdf

Reference39 articles.

1. Acinas SG, Sarma-Rupavtarm R, Klepac-Ceraj V, Polz MF (2005) PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample. Appl Environ Microbiol 71(12):8966–8969

2. Alosaimi S, Bandiang A, van Biljon N, Awany D, Thami PK, Tchamga MSS et al. (2020) A broad survey of DNA sequence data simulation tools. Brief Funct Genom 19(1):49–59

3. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.