Evaluation of Genomic Contamination Detection Tools and Influence of Horizontal Gene Transfer on Their Efficiency through Contamination Simulations at Various Taxonomic Ranks
-
Published:2024-01-10
Issue:1
Volume:4
Page:124-132
-
ISSN:2673-8007
-
Container-title:Applied Microbiology
-
language:en
-
Short-container-title:Applied Microbiology
Author:
Cornet Luc1234, Lupo Valérian2ORCID, Declerck Stéphane2, Baurain Denis4ORCID
Affiliation:
1. BCCM/IHEM, Mycology and Aerobiology, Sciensano, 1050 Brussels, Belgium 2. BCCM/MUCL and Laboratory of Mycology, Earth and Life Institute, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium 3. BCCM/ULC, InBioS–Molecular Diversity and Ecology of Cyanobacteria, University of Liège, 4000 Liège, Belgium 4. InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, 4000 Liège, Belgium
Abstract
Genomic contamination remains a pervasive challenge in (meta)genomics, prompting the development of numerous detection tools. Despite the attention that this issue has attracted, a comprehensive comparison of the available tools is absent from the literature. Furthermore, the potential effect of horizontal gene transfer on the detection of genomic contamination has been little studied. In this study, we evaluated the efficiency of detection of six widely used contamination detection tools. To this end, we developed a simulation framework using orthologous group inference as a robust basis for the simulation of contamination. Additionally, we implemented a variable mutation rate to simulate horizontal transfer. Our simulations covered six distinct taxonomic ranks, ranging from phylum to species. The evaluation of contamination levels revealed the suboptimal precision of the tools, attributed to significant cases of both over-detection and under-detection, particularly at the genus and species levels. Notably, only so-called “redundant” contamination was reliably estimated. Our findings underscore the necessity of employing a combination of tools, including Kraken2, for accurate contamination level assessment. We also demonstrate that none of the assayed tools confused contamination and horizontal gene transfer. Finally, we release CRACOT, a freely accessible contamination simulation framework, which holds promise in evaluating the efficacy of future algorithms.
Funder
Belgian State–Federal Public Planning Science Policy Office F.R.S.-FNRS
Reference32 articles.
1. Cornet, L., and Baurain, D. (2022). Contamination Detection in Genomic Data: More Is Not Enough. Genome Biol., 23. 2. Schierwater, B., Eitel, M., Jakob, W., Osigus, H.-J., Hadrys, H., Dellaporta, S.L., Kolokotronis, S.-O., and DeSalle, R. (2009). Concatenated Analysis Sheds Light on Early Metazoan Evolution and Fuels a Modern “Urmetazoon” Hypothesis. PLoS Biol., 7. 3. Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., Wörheide, G., and Baurain, D. (2011). Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough. PLoS Biol., 9. 4. Origin of Land Plants Revisited in the Light of Sequence Contamination and Missing Data;Brinkmann;Curr. Biol.,2012 5. Lupo, V., Van Vlierberghe, M., Vanderschuren, H., Kerff, F., Baurain, D., and Cornet, L. (2021). Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics. Front. Microbiol., 12.
|
|