ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences
Author:
Catlett Dylan1,
Son Kevin1,
Liang Connie1
Affiliation:
1. Earth Research Institute, University of California, Santa Barbara, Santa Barbara, CA, United States of America
Abstract
Background
High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological significance to these genetic data. To assign taxonomy to an ASV, a taxonomic assignment algorithm compares the ASV to a collection of reference sequences (a reference database) with known taxonomic affiliations. However, many taxonomic assignment algorithms and reference databases are available, and the optimal algorithm and database for a particular scientific question is often unclear. Here, we present the ensembleTax R package, which provides an efficient framework for integrating taxonomic assignments predicted with any number of taxonomic assignment algorithms and reference databases to determine ensemble taxonomic assignments for ASVs.
Methods
The ensembleTax R package relies on two core algorithms: taxmapper and assign.ensembleTax. The taxmapper algorithm maps taxonomic assignments derived from one reference database onto the taxonomic nomenclature (a set of taxonomic naming and ranking conventions) of another reference database. The assign.ensembleTax algorithm computes ensemble taxonomic assignments for each ASV in a data set based on any number of taxonomic assignments determined with independent methods. Various parameters allow analysts to prioritize obtaining either more ASVs with more predicted clade names or more robust clade name predictions supported by multiple independent methods in ensemble taxonomic assignments.
Results
The ensembleTax R package is used to compute two sets of ensemble taxonomic assignments for a collection of protistan ASVs sampled from the coastal ocean. Comparisons of taxonomic assignments predicted by individual methods with those predicted by ensemble methods show that conservative implementations of the ensembleTax package minimize disagreements between taxonomic assignments predicted by individual and ensemble methods, but result in ASVs with fewer ranks assigned taxonomy. Less conservative implementations of the ensembleTax package result in an increased fraction of ASVs classified at all taxonomic ranks, but increase the number of ASVs for which ensemble assignments disagree with those predicted by individual methods.
Discussion
We discuss how implementation of the ensembleTax R package may be optimized to address specific scientific objectives based on the results of the application of the ensembleTax package to marine protist communities. While further work is required to evaluate the accuracy of ensemble taxonomic assignments relative to taxonomic assignments predicted by individual methods, we also discuss scenarios where ensemble methods are expected to improve the accuracy of taxonomy prediction for ASVs.
Funder
National Aeronautics and Space Administration Biodiversity and Ecological Forecasting program
Bureau of Ocean and Energy Management Ecosystem Studies Program
NOAA
NASA Earth and Space Science Fellowship
UC Santa Barbara Coastal Fund
Subject
General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience
Reference46 articles.
1. Revisions to the classification, nomenclature, and diversity of eukaryotes;Adl;Journal of Eukaryotic Microbiology,2019
2. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists;Adl;Journal of Eukaryotic Microbiology,2005
3. The revised classification of eukaryotes;Adl;Journal of Eukaryotic Microbiology,2012
4. rmarkdown: dynamic documents for R;Allaire,2020
5. GenBank;Benson;Nucleic Acids Research,2012
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献