Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study-Reference-Cited by-同舟云学术

Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study

Published:2019-12-19 Issue:4 Volume:97 Page:754-760
ISSN:2007-4476
Container-title:Botanical Sciences
language:
Short-container-title:Bot. Sci.

Author:

Ruiz-Sanchez Eduardo^ORCID,Maya-Lastra Carlos Alonso^ORCID,Steinmann Victor W.,Zamudio Sergio,Carranza Eleazar,Murillo Rosa María,Rzedowski Jerzy

Abstract

Background: GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script.Questions: How many species occurring in the FBRA have records in GenBank? What percentage of those records have been used for phylogenetic, phylogeographic, phylogenomic, barcoding, genetic diversity, and biogeographic studies?Methods: Datataxa was written in AutoIt Scripting Language in order to facilitate the extraction of information from GenBank. This information was classified in six study categories. A checklist of species published fascicles of FBRA was used as study case to apply our new script, and the previous categories were applied to the FBRA species list.Results: The script allowed us to search for meta information, like publication titles, for 2,558 species that were included in the FBRA. Of these, 1,575 had a least one record in GenBank. A total of 1,322 species were used in phylogenetic studies, followed by barcoding studies (326) and biogeographic studies (298). Phylogenomic (41), phylogeographic (34), and diversity studies (34) were the least represented.Conclusions: Datataxa was useful for mining metadata sequence information from GenBank and can be used with any list of species to get the GenBank accessions’ metadata.

Publisher

Botanical Sciences, Sociedad Botanica de Mexico, AC

Subject

Plant Science

Link

http://www.botanicalsciences.com.mx/index.php/botanicalSciences/article/download/2226/pdf_a

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Biogeographical History of the Yucatan Peninsula Endemic Flora (Spermatophyta) from a Phylogenetic Perspective1;Harvard Papers in Botany;2023-06-30

2. The Mexican flora as a case study in systematics: a meta-analysis of GenBank accessions;Botanical Sciences;2022-08-17

3. Biodiversity at the global scale: the synthesis continues;American Journal of Botany;2021-06