Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature

Author:

Lange Matthias1ORCID,Alako Blaise T F2ORCID,Cochrane Guy2ORCID,Ghaffar Mehmood1ORCID,Mascher Martin13ORCID,Habekost Pia-Katharina14ORCID,Hillebrand Upneet5ORCID,Scholz Uwe1ORCID,Schorch Florian14ORCID,Freitag Jens1ORCID,Scholz Amber Hartman5ORCID

Affiliation:

1. Leibniz Institute of Plant Genetics and Crop Plant Research, Department Breeding Research, OT Gatersleben, Corrensstrasse 3, 06466 Seeland, Germany

2. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

3. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, 04103 Leipzig, Germany

4. The Harz University of Applied Science, Department of Automation and Computer Science, Friedrichstraße 57, 38855 Wernigerode, Germany

5. Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Department Research - Microbial Ecology and Diversity, Inhoffenstraße 7B, 38124 Braunschweig, Germany

Abstract

Abstract Background Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. Findings We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. Conclusions The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.

Funder

Federal Ministry of Education and Research

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Health Informatics

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3