OpenBiodiv for Users: Applications and Approaches to Explore a Biodiversity Knowledge Graph

Author:

Penev LyubomirORCID,Zhelezov GeorgiORCID,Dimitrova MariyaORCID,Boyadzhieva IvaORCID,Georgiev TeodorORCID

Abstract

OpenBiodiv is a biodiversity database—knowledge graph based on Resource Description Framework (RDF)—that contains information extracted from the scientific literature. It provides access to an ecosystem of tools and services, including a Linked Open Dataset, an ontology (OpenBiodiv-O) and а website (Dimitrova et al. 2021). Using the available data, OpenBiodiv discovers links between various biodiversity data types (e.g., taxon names, treatments, specimens, sequences, people and institutions), to answer a user’s questions about specific taxa, scientific articles, materials examined and others. The full-text XML content is converted into Linked Open Data from journals on the ARPHA Publishing Platform and treatments extracted by Plazi’s TreatmentBank (stored in the Biodiversity Literature Repository at Zenodo). The database is updated and indexed daily using a workflow based on the Apache Kafka event-streaming platform. The workflow was developed during the European Union-funded Biodiversity Community Integrated Knowledge Library (BiCIKL) project (Penev et al. 2022b). By 1 of August 2023, the graph consisted of 24,939 articles; 167,471 treatments; 130,359 authors; 736,809 taxon names; 129,257 sequences; 1,390 institutions and collections, 117,854 figures; 18,585 tables, and 90,008 materials examined sections. Each semantic statement (e.g., authors, articles, treatments, taxonomic names, localities) has its own globally unique, persistent and resolvable identifier (GUPRI). There are four ways a user can explore the data on OpenBiodiv: General search The search engine is accessible from the OpenBiodiv homepage. The user needs to type in a key term, (e.g., a taxonomic name, authority or an article title), and the system retrieves information about it. Errors caused by misspellings are avoided due to the Elasticsearch index. It can also determine the semantic type of the searched entity. Application Programing Interface (API) OpenBiodiv can be used through a RESTful API for programmatic access. The documentation of the API is described on Swagger. The API construction and functionalities follow the recommendations elaborated by the Technical Research Infrastructures forum of the BiCIKL project (Addink et al. 2023). User applications based on a query algorithm This function can be applied for any data class. The method uses the relationships between an element type (e.g., taxon name) and the type of the section, where it can be found. An application example is Literature exploration, designed to answer the question: Give me information about X mentioned within article section type Y. The results show the number of mentions of the entity (e.g., taxon name) in the section(s) of interest (e.g., Title, Abstract, Treatment). A click navigates the user to the place in the article that mentions the item (Fig. 1). SPARQL queries in a thematic context OpenBiodiv provides a SPARQL endpoint through the Ontotext GraphDB solution*1. Several sample SPARQL queries*2 are also available on the OpenBiodiv website.

Publisher

Pensoft Publishers

Subject

General Engineering

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3