High-level ETL for semantic data warehouses

Author:

Deb Nath Rudra Pratap123,Romero Oscar2,Pedersen Torben Bach1,Hose Katja1

Affiliation:

1. Department of Computer Science, Aalborg University, Denmark. E-mails: rudra@cs.aau.dk, tbp@cs.aau.dk, khose@cs.aau.dk

2. Department of Service and Information System Engineering, Universitat Politècnica de Catalunya, Spain. E-mails: rudra@essi.upc.edu, oromero@essi.upc.edu

3. Department of Computer Science and Engineering, University of Chittagong, Bangladesh. E-mail: rudra@cu.ac.bd

Abstract

The popularity of the Semantic Web (SW) encourages organizations to organize and publish semantic data using the RDF model. This growth poses new requirements to Business Intelligence technologies to enable On-Line Analytical Processing (OLAP)-like analysis over semantic data. The incorporation of semantic data into a Data Warehouse (DW) is not supported by the traditional Extract-Transform-Load (ETL) tools because they do not consider semantic issues in the integration process. In this paper, we propose a layer-based integration process and a set of high-level RDF-based ETL constructs required to define, map, extract, process, transform, integrate, update, and load (multidimensional) semantic data. Different to other ETL tools, we automate the ETL data flows by creating metadata at the schema level. Therefore, it relieves ETL developers from the burden of manual mapping at the ETL operation level. We create a prototype, named Semantic ETL Construct (SETLCONSTRUCT), based on the innovative ETL constructs proposed here. To evaluate SETLCONSTRUCT, we create a multidimensional semantic DW by integrating a Danish Business dataset and an EU Subsidy dataset using it and compare it with the previous programmable framework SETLPROG in terms of productivity, development time, and performance. The evaluation shows that 1) SETLCONSTRUCT uses 92% fewer Number of Typed Characters (NOTC) than SETLPROG, and SETLAUTO (the extension of SETLCONSTRUCT for generating ETL execution flows automatically) further reduces the Number of Used Concepts (NOUC) by another 25%; 2) using SETLCONSTRUCT, the development time is almost cut in half compared to SETLPROG, and is cut by another 27% using SETLAUTO; and 3) SETLCONSTRUCT is scalable and has similar performance compared to SETLPROG. We also evaluate our approach qualitatively by interviewing two ETL experts.

Publisher

IOS Press

Subject

Computer Networks and Communications,Computer Science Applications,Information Systems

Reference59 articles.

1. Using semantic web technologies for exploratory OLAP: A survey;Abelló;IEEE transactions on knowledge and data engineering,2014

2. Towards Answering Provenance-Enabled SPARQL Queries Over RDF Data Cubes

3. Publishing Danish Agricultural Government Data as Semantic Web Data

4. F. Baader, D. Calvanese, D. McGuinness, P. Patel-Schneider and D. Nardi, The Description Logic Handbook: Theory, Implementation and Applications, Cambridge university press, 2003.

5. Towards a Semantic Extract-Transform-Load (ETL) Framework for Big Data Integration

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Extending The Data Integration Model As The Foundation Of Business Intelligence: A Systematic Literature Review;2023 10th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI);2023-09-20

2. The use of metadata-driven approaches for data harmonization in the medical domain: a scoping review (Preprint);JMIR Medical Informatics;2023-09-20

3. The use of metadata-driven approaches for data harmonization in the medical domain: a scoping review (Preprint);2023-09-20

4. Fiscal and Tax Integration System Based on Database Technology;2023 2nd International Joint Conference on Information and Communication Engineering (JCICE);2023-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3