Author:
Dubois Benjamin,Debode Frédéric,Hautier Louis,Hulin Julie,Martin Gilles San,Delvaux Alain,Janssen Eric,Mingeot Dominique
Abstract
Abstract
Background
The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence.
Results
We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids ‘black box’ systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets.
Conclusion
This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-use plant ITS2 and rbcL databases for which the prediction accuracy has been assessed and compared to that of other published databases.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Genetics
Reference52 articles.
1. Hebert PDN, Cywinska A, Ball SL, deWaard JR. Biological identifications through DNA barcodes. Proc R Soc Lond B. 2003;270:313–21.
2. Fazekas AJ, Kuzmina ML, Newmaster SG, Hollingsworth PM. DNA barcoding methods for land plants. In: Kress WJ, Erickson DL, editors. DNA barcode: methods and protocols. Totowa: Humana Press; 2012. p. 223–52. https://doi.org/10.1007/978-1-61779-591-6_11.
3. Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol. 2012;21:2045–50.
4. Bänsch S, Tscharntke T, Wünschiers R, Netter L, Brenig B, Gabriel D, et al. Using ITS2 metabarcoding and microscopy to analyse shifts in pollen diets of honey bees and bumble bees along a mass-flowering crop gradient. Mol Ecol. 2020;29:5003–18.
5. Smart MD, Cornman RS, Iwanowicz DD, McDermott-Kubeczko M, Pettis JS, Spivak MS, et al. A comparison of honey bee-collected pollen from working agricultural lands using light microscopy and ITS Metabarcoding. Environ Entomol. 2017;46:38–49.
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献