Abstract
AbstractThe use of short-read metabarcoding for classifying microeukaryotes is challenged by the lack of comprehensive 18S rRNA reference databases. While recent advances in high-throughput long-read sequencing provide the potential to greatly increase the phylogenetic coverage of these databases, the performance of different sequencing technologies and subsequent bioinformatics processing remain to be evaluated, primarily because of the absence of well-defined eukaryotic mock communities. To address this challenge, we created a eukaryotic rRNA operons clone-library and turned it into a precisely defined synthetic eukaryotic mock community. This mock community was then used to evaluate the performance of three long-read sequencing techniques (PacBio HiFi, and Nanopore UMI with/without clonal pre-amplification) and three tools for resolving amplicons sequence variants (ASVs) (Uchime3, Unoise3, and DADA2). We investigated the sensitivity of the sequencing techniques based on the number of detected mock taxa, and the accuracy of the different ASV-calling tools with a specific focus on the presence of chimera among the final rRNA operon ASVs. Based on our findings, we provide recommendations and best practice protocols for how to cost-effectively obtain essential error-free rRNA operons in high-throughput. An agricultural soil sample was used to demonstrate that the sequencing and bioinformatic results from the mock community also translates to highly diverse natural samples.
Publisher
Cold Spring Harbor Laboratory