Gauging triple stores with actual biological data-Reference-Cited by-同舟云学术

Gauging triple stores with actual biological data

Published:2012-01-25 Issue:S1 Volume:13 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Mironov Vladimir,Seethappan Nirmala,Blondé Ward,Antezana Erick,Splendiani Andrea,Kuiper Martin

Abstract

Abstract Background Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations. Results Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one. Conclusions Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-13-S1-S3.pdf

Reference26 articles.

1. Berners-Lee T, Hendler J, Lassila O: The Semantic Web - a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am 2001, 284: 34.

2. Shadbolt N, Hall W, Berners-Lee T: The Semantic Web revisited. Ieee Intell Syst 2006, 21: 96–101.

3. Jenssen TK, Hovig E: The semantic web and biology. Drug Discov Today 2002, 7: 992–992.

4. Antezana E, Kuiper M, Mironov V: Biological knowledge management: the emerging role of the Semantic Web technologies. Brief Bioinform 2009, 10: 392–407. 10.1093/bib/bbp024

5. Antezana E, Egana M, Blonde W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M: The cell cycle ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol 2009, 10: R58. 10.1186/gb-2009-10-5-r58

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring the Freedoms in Data Mining: Why the Trustworthiness and Integrity of the Findings are the Casualties, and How to Resolve These?;Proceedings of the Future Technologies Conference (FTC) 2021, Volume 1;2021-10-24

2. hpLysis Database-Engine: A New Data-Scheme for Fast Semantic Queries in Biomedical Databases;2018 IEEE 12th International Conference on Semantic Computing (ICSC);2018-01

3. LungMAP: The Molecular Atlas of Lung Development Program;American Journal of Physiology-Lung Cellular and Molecular Physiology;2017-11-01

4. Applying the semantic web to represent an individual’s academic and professional background;Journal of Information Science;2016-07-11

5. Suggestions for a web based universal exchange and inference language for medicine. Continuity of patient care with PCAST disaggregation;Computers in Biology and Medicine;2015-01