Abstract
Two common approaches to study the composition of environmental protist communities are metabarcoding and metagenomics. Raw metabarcoding data are usually processed into Operational Taxonomic Units (OTUs) or amplicon sequence variants (ASVs) through clustering or denoising approaches, respectively. Analogous approaches are used to assemble metagenomic reads into metagenome-assembled genomes (MAGs). Understanding the correspondence between the data produced by these two approaches can help to integrate information between the datasets and to explain how metabarcoding OTUs and MAGs are related with the underlying biological entities they are hypothesised to represent. MAGs do not contain the commonly used barcoding loci, therefore sequence homology approaches cannot be used to match OTUs and MAGs. We made an attempt to match V9 metabarcoding OTUs from the 18S rRNA gene (V9 OTUs) and MAGs from the Tara Oceans expedition based on the correspondence of their relative abundances across the same set of samples. We evaluated several metrics for detecting correspondence between features in these two datasets and developed controls to filter artefacts of data structure and processing. After selecting the best-performing metrics, ranking the V9 OTU/MAG matches by their proportionality/correlation coefficients and applying a set of selection criteria, we identified candidate matches between V9 OTUs and MAGs. In some cases, V9 OTUs and MAGs could be matched with a one-to-one correspondence, implying that they likely represent the same underlying biological entity. More generally, matches we observed could be classified into 4 scenarios: one V9 OTU matches many MAGs; many V9 OTUs match many MAGs; many V9 OTUs match one MAG; one V9 OTU matches one MAG. Notably, we found some instances in which different OTU-MAG matches from the same taxonomic group were not classified in the same scenario, with all four scenarios possible even within the same taxonomic group, illustrating that factors beyond taxonomic lineage influence the relationship between OTUs and MAGs. Overall, each scenario produces a different interpretation of V9 OTUs, MAGs and how they compare in terms of the genomic and ecological diversity they represent.
Funder
H2020 European Research Council
Departament de Recerca i Universitats de la Generalitat de Catalunya
Publisher
Public Library of Science (PLoS)
Reference52 articles.
1. Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses;ML Zepeda Mendoza;Briefings in Bioinformatics,2015
2. The others: our biased perspective of eukaryotic genomes;J Del Campo;Trends in ecology & evolution,2014
3. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses;A E Pérez-Cobas;Microb Genom,2020
4. Shotgun metagenomics, from sampling to analysis;C Quince;Nature biotechnology,2017
5. High throughput sequencing for detection of foodborne pathogens;C Sekse;Frontiers in Microbiology,2017