Abstract
AbstractExpectations are high regarding the potential of eDNA metabarcoding for diversity monitoring. To make this approach suitable for this purpose, the completeness and accuracy of reference databases used for taxonomic assignment of eDNA sequences are among the challenges to be tackled. Yet, despite ongoing efforts to increase coverage of reference databases, sequences for key species are lacking, and incorrect records in widely used repositories such as GenBank have been reported. This compromises eDNA metabarcoding studies, especially for high diverse groups such as marine fishes. Here, we have developed a workflow that evaluates the completeness and accuracy of GenBank. For a given combination of species and barcodes a gap analysis is performed, and potentially erroneous sequences are identified. Our gap analysis based on the four most used genes (cytochrome c oxidase subunit 1, 12S rRNA, 16S rRNA and cytochrome b) for fish eDNA metabarcoding found that COI, the universal choice for metazoans, is the gene covering the highest number of Northeast Atlantic marine fishes (70%), while 12S rRNA, the preferred region for fish-targeting studies, only covered about 50% of the species. The presence of too close and too distant barcode sequences as expected by their taxonomic classification confirms presence of erroneous sequences in GenBank that our workflow can detect and eliminate. Comparing taxonomic assignments of real marine eDNA samples with raw and clean reference databases for the most used 12S rRNA barcodes (teleoandMiFish), we found that both barcodes perform differently, and demonstrated that the application of the database cleaning workflow can result in drastic changes in community composition. Besides providing an automated tool for reference database curation, this study confirms the need to increase 12S rRNA reference sequences for European marine fishes, encourages the use of a multi-marker approach for better community composition assessment, and evidences the dangers of taxonomic assignments by directly querying GenBank.
Publisher
Cold Spring Harbor Laboratory
Reference84 articles.
1. Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data, Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom.
2. MARES, a replicable pipeline and curated reference database for marine eukaryote metabarcoding;Scientific Data,2020
3. Barco, A. , B. Kullmann , et al. (2022). “Detection of fish species from Marine Protected Areas of the North Sea using environmental DNA.” Journal of Fish Biology.
4. GenBank
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献