Liberating Biodiversity Data From COVID-19 Lockdown: Toward a knowledge hub for mammal host-virus information

Author:

Upham NathanORCID,Agosti DonatORCID,Poelen JorritORCID,Penev LyubomirORCID,Paul DeborahORCID,Reeder DeeAnnORCID,Simmons Nancy B.,Csorba GaborORCID,Groom QuentinORCID,Dimitrova MariyaORCID,Miller Joseph

Abstract

A deep irony of COVID-19 likely originating from a bat-borne coronavirus (Boni et al. 2020) is that the global lockdown to quell the pandemic also locked up physical access to much basic knowledge regarding bat biology. Digital access to data on the ecology, geography, and taxonomy of potential viral reservoirs, from Southeast Asian horseshoe bats and pangolins to North American deer mice, was suddenly critical for understanding the disease's emergence and spread. However, much of this information lay inside rare books and personal files rather than as open, linked, and queryable resources on the internet. Even the world's experts on mammal taxonomy and zoonotic disease could not retrieve their data from shuttered laboratories. We were caught unprepared. Why, in this digitally connected age, were such fundamental data describing life on Earth not already freely accessible online? Understanding why biodiversity science was unprepared—and how to fix it before the next pandemic—has been the focus of our COVID-19 Taskforce since April 2020 and is continuing (organized by CETAF and DiSSCo). We are a group of museum-based and academic scientists with the goal of opening the rich ecological data stored in natural history collections to the research public. This information is rooted in what may seem an unlikely location—taxonomic names and their historical usages, which are the keys for searching literature and extracting linked ecological data (Fig. 1). This has been the core motivation of our group, enabled by the pioneering efforts of Plazi (Agosti and Egloff 2009) to build tools for literature digitization, extraction, and parsing (e.g., Synospecies, Ocellus) without which biodiversity science would be even less prepared. Our group led efforts to build an additional pipeline from Plazi to the Biodiversity Literature Repository at Zenodo, a free and unlimited data repository (Agosti et al. 2019), and then to GloBI, an open-source database of biotic interactions (Poelen et al. 2014, GloBI 2020). We also developed a direct integration from Pensoft Journals to GloBI, leveraging that publisher’s indexing of computer-readable terms (called semantic metadata; Senderov et al. 2018) to extract mammal host and virus information. Overall, considerable progress was made. In total, 85,492 new interactions were added to GloBI from 14 April to 21 May 2020 (see entire dataset on Zenodo: Poelen et al. 2020). Of those, 28,839 interactions are present when subset to "hasHost", "hostOf", "pathogenOf", "virus", and 4,101 unique name combinations are present after considering mammal species synonymies (from Meyer et al. 2015). Of those interactions, 892 species of mammals and 1,530 unique virus names are involved, which compares to 754 mammals and 586 viruses in the most recent data synthesis (Olival et al. 2017). While these liberated data may still include redundancies, they demonstrate the value of our approach and the expanse of known but digitally unconnected data that remains locked in publications. We can liberate host-virus data from publications, but doing so is expensive and does not scale to the continued influx of new articles that are inadequately digitized. Our efforts make it clear that Pensoft-style semantic publishing should be expanded to all major journals. The pandemic has created an opportunity for re-thinking the way we do science in the digital age. Thankfully, our future is not the past, so we do not have to keep wasting resources to digitially 'rediscover' biodiversity knowledge. We collectively call for changes to the publishing paradigm, so that research findings are directly accessible, citable, discoverable, and reusable for creating complete forms of digital knowledge.

Publisher

Pensoft Publishers

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3