SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles-Reference-Cited by-同舟云学术

SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles

Published:2020-07-23 Issue:1 Volume:21 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Yu Zhenhua^ORCID,Du Fang,Ban Rongjun,Zhang Yuanwei

Abstract

Abstract Background A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required. Results Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools. Conclusions SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.

Funder

National Natural Science Foundation of China

Science and Technique Research Foundation of Ningxia Institutions of Higher Education

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-020-03665-5.pdf

Reference35 articles.

1. Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. Comparison of next-generation sequencing systems. J Biomed Biotechnol. 2012;2012:251364..

2. Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction. Brief Bioinform. 2015;17(1):154–79.

3. Robasky K, Lewis NE, Church GM. The role of replicates for error mitigation in next-generation sequencing. Nat Rev Genet. 2014;15(1):56.

4. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. Genome Res. 1998;8(3):175–85.

5. Schirmer M, D’Amore R, Ijaz UZ, Hall N, Quince C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics. 2016;17(1):125.

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prevalence and genomic-based antimicrobial resistance analysis of Avibacterium paragallinarum isolates in Guangdong Province, China;Poultry Science;2024-06

2. Drug Recommendation System for Cancer Patients Using XAI: A Traceability Perspective;Communications in Computer and Information Science;2024

3. Sandy: A user-friendly and versatile NGS simulator to facilitate sequencing assay design and optimization;2023-08-27

4. Comparison of k-mer-based de novo comparative metagenomic tools and approaches;Microbiome Research Reports;2023-07-20

5. Prevalence and whole genome phylogenetic analysis reveal genetic relatedness between antibiotic resistance Salmonella in hatchlings and older chickens from farms in Nigeria;Poultry Science;2023-03