Abstract
ABSTRACTEnvironmental DNA metabarcoding has revolutionized ecological surveys of natural systems. By amplifying and sequencing small gene fragments from environmental samples containing complex DNA mixtures, scientists are now capable of exploring biodiversity patterns across the tree of life in a time-efficient and cost-effective manner. However, the accuracy of species and haplotype identification can be compromised by sequence artefacts and pseudogenes. Despite various strategies developed over the years, effective removal of artefacts remains challenging and inconsistent data reporting standards hinder reproducibility in eDNA metabarcoding experiments. To address these issues, we introducetombRaider, an open-source command line software program (https://github.com/gjeunen/tombRaider) and R package (https://github.com/gjeunen/tombRaider_R) to remove artefacts and pseudogenes from metabarcoding data post clustering and denoising.tombRaiderfeatures a modular algorithm capable of evaluating multiple criteria, including sequence similarity, co-occurrence patterns, taxonomic assignment, and the presence of stop codons. We validatedtombRaiderusing various published data sets, including mock invertebrate communities, air eDNA from a zoo, and salmon haplotypes from aquatic eDNA. Our results demonstrate thattombRaidereffectively removed a higher proportion of artefacts while retaining authentic sequences, thus enhancing the accuracy and reliability of eDNA-derived diversity metrics. This user-friendly software program not only improves data quality in eDNA metabarcoding studies, but also contributes to standardised reporting practices, an aspect currently lacking in this emerging research field.
Publisher
Cold Spring Harbor Laboratory