Abstract
Digital twins combine modelling, domain knowledge, computing power, and multiple datasets to offer the potential to unlock new insights into biodiversity (de Koning et al. 2023). The Biodiversity Digital Twin (BioDT) project pioneers this approach to aid in understanding biodiversity through prototyping digital twins (Golivets et al. 2024). However, working with biodiversity data presents challenges due to their dynamic and diverse nature as well as the need for having to deal with incompleteness, uncertainties (Rocchini et al. 2011), disproportionate representation patterns in global studies from wealthier economies (Hughes et al. 2024), and issues with data aggregation and integration (Wüest et al. 2020). Similar to BioDT, there are also plans for creating a Digital Twin of the Ocean (DTO). DTO-Bioflow project addresses these data challenges in the marine domain, where studies show that although European seas host 48,000 marine species (75% described), the data are not yet FAIR (Findable, Accessible, Interoperable, and Reusable) (Ramírez et al. 2022). These challenges hamper the modelling capabilities needed for effective predictions and conservation prioritisation.
The adaptability of digital twins across temporal and spatial scales and their ability to model dynamic ecosystems make them ideal for biodiversity research and real-time conservation efforts. However, their success hinges on the consistent integration and alignment of data from disparate sources (Trantas et al. 2023). This integration involves standardising terms used to describe datasets, such as temporal coverage or controlled vocabularies like "Forest" for targetHabitatScope. Thus, adopting data standards is essential. Additionally, challenges such as model bias (Lewers 2023), research context, and data provenance must be considered, adding complexity to metadata capture and alignment.
BioDT addresses these challenges with modular building blocks for data integration, model deployment, and workflow management. This approach facilitates the gradual adoption of data standards and FAIR principles, which need to encompass not just data, but also models and software. As automation, ease of reproducibility, and deployability are critical for digital twinning success, data integration and interoperability issues may arise due to missing or insufficient parameter descriptions in the model, incomplete information on data selection, and the unavailability of required software package details. Thus, data standardisation provides a pathway for a consistent approach that can be adopted for different use cases.
Common data sources in BioDT, like species occurrences and environmental variables, benefit from standards such as Darwin Core (Wieczorek et al. 2012) and the Ecological Metadata Language (Jones et al. 2019). While valuable, these standards may not fully encompass the complexity needed for comprehensive biodiversity digital twins. Additionally, differing familiarity with these standards and FAIR principles among communities pose challenges. Continuous adoption of data standards, alongside exploring complementary approaches like schema.org or bioschemas.org for capturing diverse (meta)data, is essential. Collaboration with data providers, modellers, and various research infrastructures is also crucial (Andrew et al. 2024).
We share our experience using Research Object Crate (RO-Crate), leveraging common JavaScript Object Notation for Linked Data (JSON-LD) representation for metadata profiles and workflow representation, to connect with different infrastructures. In the BioDT project, we are working with various use cases to create prototype digital twins that can serve as valuable resources for other projects. The evolving landscape of digital twin concepts, along with other European Union-funded initiatives like DTO-Bioflow and Destination Earth (DestinE), emphasises the importance of alignment within the digital twin ecosystem. BioDT is committed to aligning with and contributing to this broader context, highlighting the critical role of data standardisation and FAIR implementation.