Repositories for Taxonomic Data: Where We Are and What is Missing

Author:

Miralles Aurélien12,Bruy Teddy12,Wolcott Katherine23,Scherz Mark D45,Begerow Dominik6,Beszteri Bank7,Bonkowski Michael8,Felden Janine910,Gemeinholzer Birgit11,Glaw Frank4,Glöckner Frank Oliver10,Hawlitschek Oliver412,Kostadinov Ivaylo13,Nattkemper Tim W14,Printzen Christian15,Renz Jasmin16,Rybalka Nataliya17,Stadler Marc18,Weibulat Tanja13,Wilke Thomas19,Renner Susanne S2,Vences Miguel20

Affiliation:

1. Departement Origins and Evolution, Institut Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, 57 rue Cuvier, CP50, 75005 Paris, France

2. Systematic Botany and Mycology, University of Munich (LMU), Menzingerstraße 67, 80638 Munich, Germany

3. National Museum of Natural History, Smithsonian Institution, Washington, DC, USA

4. Department of Herpetology, Zoologische Staatssammlung München (ZSM-SNSB), Münchhausenstraße 21, 81247 München, Germany

5. Department of Biology, Universität Konstanz, Universitätstraße 10, 78464 Konstanz, Germany

6. Department of Geobotany, Ruhr-University Bochum, Universitätsstraße 150, 44780 Bochum, Germany

7. Department of Phycology, Faculty of Biology, University of Duisburg-Essen, Universitätsstraße 2, 45141 Essen, Germany

8. Department of Terrestrial Ecology, Center of Excellence in Plant Sciences (CEPLAS), Terrestrial Ecology, Institute of Zoology, University of Cologne, 50674 Köln, Germany

9. MARUM - Center for Marine Environmental Sciences, University of Bremen, Leobenerstraße 8, 28359 Bremen, Germany

10. Alfred Wegener Institute - Helmholtz Center for Polar- and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany

11. Department of Systematic Botany, Justus Liebig University Gießen, Heinrich-Buff Ring 38, 35392 Giessen, Germany

12. Department of Scientific Infrastructure, Centrum für Naturkunde (CeNak), Universität Hamburg, Martin-Luther-King-Platz 3, 20146 Hamburg, Germany

13. GFBio - Gesellschaft für Biologische Daten e.V., c/o Research II, Campus Ring 1, 28759 Bremen, Germany

14. Biodata Mining Group, Center of Biotechnology (CeBiTec), Bielefeld University, PO Box 100131, 33501 Bielefeld, Germany

15. Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Senckenberganlage 25, 60325 Frankfurt/Main, Germany

16. Zooplankton Research Group, DZMB – Senckenberg am Meer, Martin-Luther-King Platz 3, 20146 Hamburg, Germany

17. Department of Experimental Phycology and Culture Collection of Algae, University Göttingen, Nikolausberger-Weg 18, 37073 Göttingen, Germany

18. Department Microbial Drugs, Helmholtz Centre for Infection Research (HZI), and German Centre for Infection Research (DZIF), Partner Site Hannover-Braunschweig, Inhoffenstrasse 7, 38124 Braunschweig, Germany

19. Department of Animal Ecology and Systematics, Justus Liebig University Gießen, Heinrich-Buff Ring 26, 35392 Giessen, Germany

20. Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany

Abstract

Abstract Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term—ideally perpetual—data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach—linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000–40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.]

Funder

Deutsche Forschungsgemeinschaft

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

Cited by 53 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3