Putting your Finger upon the Simplest Data

Author:

Ariño ArturoORCID

Abstract

Over the past decades, digitization endeavors across many institutions holding natural history collections (NHCs) have multiplied with three broad aims: first, to facilitate collection management by moving existing analog catalogues into digital form; second, to efficiently document and inventory specimens in collections, including imaging them as taxonomical surrogates; and third, to enable discovery of, and access to, the resulting collection data. NHCs contain a unique wealth of potential knowledge in the form of primary biodiversity data records (PBR): at its most basic level, the “what, where and when” of occurrences of the specimens in the collections. But as T.S. Eliot famously said, “knowledge is invariably a matter of degree”. For such data to be transformed into digitally accessible knowledge (DAK) that is conducive to an understanding about how the natural world works, release of digitized data (the “this we know”) is necessary. At least two billion specimens are estimated to exist in NHCs already, but only a small fraction can be considered properly DAK: most have either not been digitized yet, or not released through a discovery facility. Digitizing is relatively costly as it often entails manually processing each specimen unit (e.g. a herbarium sheet, a pinned insect, or a vial full of invertebrates). How long could it take us to transform all NHCs into DAK? Can we keep up with the natural growth in collections? The Global Biodiversity Information Facility (GBIF) has become the de facto main index of PBR, both originated in NHCs or as field observations. Digitized NHC that are standards-compliant and can be connected to, or harvested by, GBIF, effectively become DAK. I have examined GBIF growth data looking for a pattern of DAK generation. I found that the rate of NHC-based PBR accrual is remarkably constant: the total DAK shows a strongly linear growth, as opposed to the exponential growth exhibited by cumulative observation data. Projecting the trend to the estimated holdings shoots the completion many decades ahead. In addition, digitized data appear to be taxonomically biased. Digitization efforts must therefore step up qualitatively in order to enable processing the backlog, let alone newly-acquired accessions, within one generation. Among several possible solutions, emerging, industrial-scale mass-digitization techniques may help harnessing this otherwise daunting task—but there’s also a risk that DAK becomes even more uneven across taxon groups because of the narrow application specificity of such techniques, thus potentially biasing our knowledge of nature.

Publisher

Pensoft Publishers

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3