Bridging Data Standards and FAIR Principles in Biodiversity Digital Twinning: Prototyping, Challenges, Lessons Learned, and Future Plans

Author:

Islam SharifORCID,Lopez Gordillo JulianORCID,Endresen DagORCID,Andrew CarrieORCID

Abstract

Digital twins combine modelling, domain knowledge, computing power, and multiple datasets to offer the potential to unlock new insights into biodiversity (de Koning et al. 2023). The Biodiversity Digital Twin (BioDT) project pioneers this approach to aid in understanding biodiversity through prototyping digital twins (Golivets et al. 2024). However, working with biodiversity data presents challenges due to their dynamic and diverse nature as well as the need for having to deal with incompleteness, uncertainties (Rocchini et al. 2011), disproportionate representation patterns in global studies from wealthier economies (Hughes et al. 2024), and issues with data aggregation and integration (Wüest et al. 2020). Similar to BioDT, there are also plans for creating a Digital Twin of the Ocean (DTO). DTO-Bioflow project addresses these data challenges in the marine domain, where studies show that although European seas host 48,000 marine species (75% described), the data are not yet FAIR (Findable, Accessible, Interoperable, and Reusable) (Ramírez et al. 2022). These challenges hamper the modelling capabilities needed for effective predictions and conservation prioritisation. The adaptability of digital twins across temporal and spatial scales and their ability to model dynamic ecosystems make them ideal for biodiversity research and real-time conservation efforts. However, their success hinges on the consistent integration and alignment of data from disparate sources (Trantas et al. 2023). This integration involves standardising terms used to describe datasets, such as temporal coverage or controlled vocabularies like "Forest" for targetHabitatScope. Thus, adopting data standards is essential. Additionally, challenges such as model bias (Lewers 2023), research context, and data provenance must be considered, adding complexity to metadata capture and alignment. BioDT addresses these challenges with modular building blocks for data integration, model deployment, and workflow management. This approach facilitates the gradual adoption of data standards and FAIR principles, which need to encompass not just data, but also models and software. As automation, ease of reproducibility, and deployability are critical for digital twinning success, data integration and interoperability issues may arise due to missing or insufficient parameter descriptions in the model, incomplete information on data selection, and the unavailability of required software package details. Thus, data standardisation provides a pathway for a consistent approach that can be adopted for different use cases. Common data sources in BioDT, like species occurrences and environmental variables, benefit from standards such as Darwin Core (Wieczorek et al. 2012) and the Ecological Metadata Language (Jones et al. 2019). While valuable, these standards may not fully encompass the complexity needed for comprehensive biodiversity digital twins. Additionally, differing familiarity with these standards and FAIR principles among communities pose challenges. Continuous adoption of data standards, alongside exploring complementary approaches like schema.org or bioschemas.org for capturing diverse (meta)data, is essential. Collaboration with data providers, modellers, and various research infrastructures is also crucial (Andrew et al. 2024). We share our experience using Research Object Crate (RO-Crate), leveraging common JavaScript Object Notation for Linked Data (JSON-LD) representation for metadata profiles and workflow representation, to connect with different infrastructures. In the BioDT project, we are working with various use cases to create prototype digital twins that can serve as valuable resources for other projects. The evolving landscape of digital twin concepts, along with other European Union-funded initiatives like DTO-Bioflow and Destination Earth (DestinE), emphasises the importance of alignment within the digital twin ecosystem. BioDT is committed to aligning with and contributing to this broader context, highlighting the critical role of data standardisation and FAIR implementation.

Publisher

Pensoft Publishers

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3