FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example-Reference-Cited by-同舟云学术

FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

Published:2021-11-25 Issue:1 Volume:12 Page:
ISSN:2041-1480
Container-title:Journal of Biomedical Semantics
language:en
Short-container-title:J Biomed Semant

Author:

Vogt Lars^ORCID

Abstract

Abstract Background The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. Results Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. Conclusions We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.

Funder

Deutsche Forschungsgemeinschaft

Leibniz-Gemeinschaft

European Research Council

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Health Informatics,Computer Science Applications,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s13326-021-00254-0.pdf

Reference105 articles.

1. Adam K, Hammad I, Adam M, Fakharaldien I, Majid MA. Big Data Analysis and Storage. In: Proceedings of the 2015 international conference on operations excellence and service engineering. Orlando: IEOM Society; 2015. p. 648–59.

2. Marr B. How much data do we create every day? The mind-blowing stats everyone should read [internet]. 2018. Available from: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#44f8c86860ba

3. Data never sleeps 5.0. Available from:. https://www.domo.com/learn/data-never-sleeps-5. Accessed 18 Nov 2021.

4. Jinha AE. Article 50 million: an estimate of the number of scholarly articles in existence. Learn Publ. 2010;23(3):258–63. https://doi.org/10.1087/20100308.

5. Gray J. Jim Gray on eScience: a transformed scientific method. In: Hey T, Tansley S, Tolle K, editors. The Fourth Paradigm: Data-Intensive Scientific Discoveries. Redmond: Microsoft Research; 2009. p. xvii–xxi.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SCICERO: A deep learning and NLP approach for generating scientific knowledge graphs in the computer science domain;Knowledge-Based Systems;2022-12

2. A semantically enriched taxonomic revision of Gryonoides Dodd, 1920 (Hymenoptera, Scelionidae), with a review of the hosts of Teleasinae;Journal of Hymenoptera Research;2021-12-23