CuReSim-LoRM: A Tool to Simulate Metabarcoding Long Reads
-
Published:2023-09-12
Issue:18
Volume:24
Page:14005
-
ISSN:1422-0067
-
Container-title:International Journal of Molecular Sciences
-
language:en
-
Short-container-title:IJMS
Author:
Mesloub Yasmina1, Beury Delphine1ORCID, Vandermeeren Félix1, Caboche Ségolène1ORCID
Affiliation:
1. Univ. Lille, CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, US 41-UAR 2014-PLBS, F-59000 Lille, France
Abstract
Metabarcoding DNA sequencing has revolutionized the study of microbial communities. Third-generation sequencing producing long reads had opened up new perspectives. Obtaining the full-length ribosomal RNA gene would permit one to reach a better taxonomic resolution at the species or the strain level. However, Oxford Nanopore Technologies (ONT) sequencing produces reads with high error rates, which introduces biases in analysis. Understanding the biases introduced during the analysis allows one to better interpret the biological results and take care of conclusions drawn from metabarcoding experiments. To benchmark an analysis process, the ground truth, i.e., the real composition of the microbial community, has to be known. In addition to artificial mock communities, simulated data are often used to evaluate the biases and performances of the bioinformatics analysis step. Currently, no specific tool has been developed to simulate metabarcoding long reads, mimic the error rate and the length distribution, and allow one to benchmark the analysis process. Here, we introduce CuReSim-LoRM, for the customized read simulator to generate long reads for metabarcoding. We showed that CuReSim-LoRM is able to produce reads with varying error rates and length distributions by mimicking the real data very well.
Funder
French National Research Agency
Subject
Inorganic Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Computer Science Applications,Spectroscopy,Molecular Biology,General Medicine,Catalysis
Reference21 articles.
1. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, perils, and pitfalls;Janda;J. Clin. Microbiol.,2007 2. Next-generation sequencing: Insights to advance clinical investigations of the microbiome;Wensel;J. Clin. Investig.,2022 3. Computational methods for 16S metabarcoding studies using Nanopore sequencing data;Santos;Comput. Struct. Biotechnol. J.,2020 4. Winand, R., Bogaerts, B., Hoffman, S., Lefevre, L., Delvoye, M., Van Braekel, J., Fu, Q., Roosens, N.H., Keersmaecker, S.C.D., and Vanneste, K. (2019). Targeting the 16s rRNA gene for bacterial identification in complex mixed samples: Comparative evaluation of second (Illumina) and third (Oxford Nanopore Technologies) generation sequencing technologies. Int. J. Mol. Sci., 21. 5. Szoboszlay, M., Schramm, L., Pinzauti, D., Scerri, J., Sandionigi, A., and Biazzo, M. (2023). Nanopore is preferable over Illumina for 16S amplicon sequencing of the gut Microbiota when species-level taxonomic classification, accurate estimation of richness, or focus on rare taxa is required. Microorganisms, 11.
|
|