COInr and mkCOInr: Building and customizing a non-redundant barcoding reference database from BOLD and NCBI using a lightweight pipeline-Reference-Cited by-同舟云学术

COInr and mkCOInr: Building and customizing a non-redundant barcoding reference database from BOLD and NCBI using a lightweight pipeline

Published:2022-05-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Meglécz Emese^ORCID

Abstract

AbstractThe taxonomic assignment of metabarcoding data strongly depends on the taxonomic coverage of the reference database. Therefore, it is fundamental to access and pool data from the two major sources of COI sequences, the BOLD and the NCBI nucleotide databases, and enrich them with custom COI data, when available.The COInr database is a freely available, easy-to-access database of COI reference sequences extracted from the BOLD and NCBI nucleotide databases. It is a comprehensive database: not limited to a taxon, a gene region, or a taxonomic resolution; therefore, it is a good starting point for creating custom databases. Sequences are dereplicated between databases and within taxa. Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing to create their full or ranked linages.The mkCOInr tool is a series of Perl scripts necessary to download sequences from BOLD and NCBI, build the COInr database and customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifier.The COInr database can be downloaded from https://doi.org/10.5281/zenodo.6555985 and mkCOInr and the full documentation is available at https://github.com/meglecz/mkCOInr.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. Promises and pitfalls of using high‐throughput sequencing for diet analysis

2. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

3. Why the COI barcode should be the community DNA metabarcode for the metazoa

4. MARES, a replicable pipeline and curated reference database for marine eukaryote metabarcoding

5. An efficient and robust laboratory workflow and tetrapod database for larger scale environmental DNA studies

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies;PeerJ;2023-01-09

2. Comparison of morphological identification and DNA metabarcoding for dietary analysis of faeces from a subtropical lizard;Wildlife Research;2022-09-05