Abstract
AbstractThe success of a metabarcoding study is determined by the extent of taxonomic coverage and the quality of records available in the DNA barcode reference database used. This study aimed to create an rbcLa and trnL (UAA) DNA barcode sequence reference database of plant species that are potential herbivore foraging targets and commonly found in semi-arid savannas of eastern South Africa. A study-area-specific species list of 755 species was compiled. Thereafter, reference libraries for rbcLa and trnL (UAA) sequences were created mined from sequence databases according to specific quality criteria to ensure accurate taxonomic coverage and resolution. The taxonomic reliability of these reference libraries was evaluated by testing for the presence of a barcode gap, identifying a data-appropriate identification threshold, and determining the identification accuracy of reference sequences via primary distance-based criteria. The final rbcLa reference dataset consisted of 1238 sequences representing 318 genera and 562 species. The final trnL dataset consisted of 921 sequences representing 270 genera and 461 species. Barcode gaps were found for 76% of the taxa in the rbcL barcode reference dataset and 68% of the taxa in the trnL barcode reference dataset. The identification success rate, calculated with the k-nn criterion was 85.86% for the rbcL dataset and 73.72% for the trnL dataset. The datasets for rbcL and trnL combined during this study are not presented as a complete DNA reference library, but rather as two datasets that should be used in unison to identify plants present in the semi-arid eastern savannas of South Africa.
Publisher
Cold Spring Harbor Laboratory