Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data-Reference-Cited by-同舟云学术

Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data

Published:2020-03-20 Issue: Volume:4 Page:
ISSN:2535-0897
Container-title:Biodiversity Information Science and Standards
language:
Short-container-title:BISS

Author:

Chapman Arthur^ORCID,Belbin Lee^ORCID,Zermoglio Paula^ORCID,Wieczorek John^ORCID,Morris Paul,Nicholls Miles,Rees Emily Rose,Veiga Allan,Thompson Alexander,Saraiva Antonio^ORCID,James Shelley^ORCID,Gendreau Christian^ORCID,Benson Abigail^ORCID,Schigel Dmitry^ORCID

Abstract

The quality of biodiversity data publicly accessible via aggregators such as GBIF (Global Biodiversity Information Facility), the ALA (Atlas of Living Australia), iDigBio (Integrated Digitized Biocollections), and OBIS (Ocean Biogeographic Information System) is often questioned, especially by the research community. The Data Quality Interest Group, established by Biodiversity Information Standards (TDWG) and GBIF, has been engaged in four main activities: developing a framework for the assessment and management of data quality using a fitness for use approach; defining a core set of standardised tests and associated assertions based on Darwin Core terms; gathering and classifying user stories to form contextual-themed use cases, such as species distribution modelling, agrobiodiversity, and invasive species; and developing a standardised format for building and managing controlled vocabularies of values. Using the developed framework, data quality profiles have been built from use cases to represent user needs. Quality assertions can then be used to filter data suitable for a purpose. The assertions can also be used to provide feedback to data providers and custodians to assist in improving data quality at the source. A case study, using two different implementations of tests and assertions based around the Darwin Core "Event Date" terms, were also tested against GBIF data, to demonstrate that the tests are implementation agnostic, can be run on large aggregated datasets, and can make biodiversity data more fit for typical research uses.

Funder

Fundação de Amparo à Pesquisa do Estado de São Paulo

National Science Foundation

Publisher

Pensoft Publishers

Link

https://biss.pensoft.net/article/50889/download/pdf/

Reference63 articles.

1. Spatial bias in the GBIF database and its effect on modeling species' geographic distributions

2. A specialist’s audit of aggregated occurrence records: An ‘aggregator’s’ perspective

3. Darwin Core quick reference guide;(TDWG)

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. How sensitive are species distribution models to different background point selection strategies? A test with species at various equilibrium levels;Ecological Modelling;2024-07

2. A lab-centric, workflow-based data management system for environmental DNA research;Research Ideas and Outcomes;2024-03-28

3. Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for peer review;Biodiversity Information Science and Standards;2023-09-07

4. Promoting High-Quality Data in OBIS: Insights from the OBIS Data Quality Assessment and Enhancement Project Team;Biodiversity Information Science and Standards;2023-09-05

5. A conceptual approach to developing biodiversity informatics as a field of science in South Africa;Frontiers in Ecology and Evolution;2023-07-05