Enabling semantic queries across federated bioinformatics databases

Author:

Sima Ana Claudia1234,Mendes de Farias Tarcisio2345,Zbinden Erich14,Anisimova Maria14,Gil Manuel14,Stockinger Heinz4,Stockinger Kurt1,Robinson-Rechavi Marc45,Dessimoz Christophe23467

Affiliation:

1. ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland

2. Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland

3. Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland

4. SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland

5. Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland

6. Department of Genetics, Evolution, and Environment, University College London, Gower St, London WC1E 6BT, UK

7. Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK

Abstract

AbstractMotivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.

Funder

Swiss National Research Programme 75 ‘Big Data’

Swiss National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Information Systems

Cited by 29 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3