Mobilising Long-Term Natural Environment and Biodiversity Data and Exposing it for Federated, Semantic Queries

Author:

Koivula HannaORCID,Wohner ChristophORCID,Magagna BarbaraORCID,Tagliolato Acquaviva d'Aragona PaoloORCID,Oggioni AlessandroORCID

Abstract

Biodiversity and ecosystems cannot be studied without assessing the impacts of changing environmental conditions. Since the 1980s, the U.S. National Science Foundation's Long Term Ecological Research (LTER) Network has been a major force in the field of ecology to better understand ecosystems. In Europe, the LTER developments are led by the the Integrated European Long-Term Ecosystem, critical zone and socio-ecological system Research Infrastructure (eLTER RI), a currently project-based infrastructure initiative with the aim to facilitate high impact research and catalyse new insights about the compounded impacts of climate change, biodiversity loss, soil degradation, pollution, and unsustainable resource use on a range of European ecosystems and socio-ecological systems. The European LTER network, which forms the basis for the up-coming eLTER RI, is active in 26 countries and has 500 registered sites that provide legacy data e.g., historical time-series data about the environment (not only biodiversity). Its site information and dataset metadata with the measured variables are available to be searched at the Dynamic Ecological Information Management System - Site and dataset registry (DEIMS-SDR, Wohner et al. 2019). While DEIMS-SDR data models utilize parts of the Ecological Metadata Language (EML) schema 2.0.0, location information follows the European INSPIRE specification. The future eLTER data is planned to consist of site-based, long-term time-series of ecological data. The eLTER projects have defined eLTER Standard Observations (SO), which will include the minimum set of variables as well as the associated method protocols that can characterise adequately the state and future trends of the Earth's systems. (Masó et al. 2020, Reyers et al. 2017). The current eLTER network consists of sites that differ in terms of infrastructure maturity or environment type and may focus on one or several of the future SOs or they are not yet executing any holistic monitoring scheme. The main objective is to convert the eLTER site network into a distributed research infrastructure that incorporates a clearly outlined mandatory monitoring program. Essential to this effort are the suggested variables for eLTER SOs and the corresponding methods and protocols for relevant habitat types according to the European Nature Information System (EUNIS) in each domain. eLTER variables are described by using the eLTER thesaurus "EnvThes". These descriptions are currently enhanced by the use of the InteroperAble Descriptions of Observable Property Terminology (I-ADOPT, Magagna et al. 2022) framework to provide the necessary level of detail required for seamless data discovery and integration. Variables and their associated methods and protocols will be formalised to enable automatic site classifications, by building on existing observation representations such as the Extensible Observation Ontology (OBOE), Open Geospatial Consortium's Observation and Measurement, and the future eLTER Standard Observation ontology. DEIMS-SDR will continue to be used as a core service with an RDF representation of its assets (sites, sensors, activities, people) currently being implemented. This action is synced with the Biodiversity Digital Twin (BioDT) project to ensure maximum findability, accessibility, interoperability and re-usability (FAIRness; Wilkinson et al. 2016) of data through FAIR Digital Objects (FDO). Other (digital) assets such as datasets, models and analytical workflows will be documented in the Digital Asset Register (DAR) alongside semantic mapping and crosswalk techniques, to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). The Biodiversity Digital Twin (BioDT) project is bringing together biodiversity and natural environment data from seven thematic use cases for modeling. BioDT prototypes rely on openly available data that comes from multiple heterogeneous sources using a multitude of standards and formats. In the pilot phase, merging data requires "hand picking" from selected sources, and automation of workflows would still require many additional steps. There are ongoing efforts in both the BioDT and eLTER projects to find best ways and practices to bring the raw data together by using suitable standards but also to harmonise the other environment variables by referring to vocabularies and possibly express the data as FDOs. Currently both the EML schema and Darwin Core standard (Darwin Core Task Group 2009; with registered extensions) allow referring to external schemas and vocabularies, which give flexibility but may still prove to be too narrow for the multitude of data types and formats the natural environment data requires. We welcome discussion about how to create good practices for enriching and harmonising natural environment data and species occurrence data in a meaningful way. GBIF's new data model and enriching the raw data with semantic artefacts may prove to be the way to provide thematic data products that combine data from multiple sources.

Publisher

Pensoft Publishers

Subject

General Engineering

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3