Affiliation:
1. Aalborg University, Denmark
Abstract
This paper demonstrates
ETLMR
, a novel dimensional Extract--Transform--Load (ETL) programming framework that uses Map-Reduce to achieve scalability. ETLMR has built-in native support of data warehouse (DW) specific constructs such as star schemas, snowflake schemas, and slowly changing dimensions (SCDs). This makes it possible to build MapReduce-based dimensional ETL flows very easily. The ETL process can be configured with only few lines of code. We will demonstrate the concrete steps in using ETLMR to load data into a (partly snowflaked) DW schema. This includes configuration of data sources and targets, dimension processing schemes, fact processing, and deployment. In addition, we also present the scalability on large data sets.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Evaluating push-down on NoSQL data sources;Proceedings of The International Workshop on Big Data in Emergent Distributed Environments;2022-06-12
2. Data Warehouses and Big Data;International Journal of Organizational and Collective Intelligence;2020-07
3. Empirical Analysis of Programmable ETL Tools;Communications in Computer and Information Science;2019
4. Real-time ETL in Striim;Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics;2018-08-27
5. NewTL;Proceedings of the 33rd Annual ACM Symposium on Applied Computing;2018-04-09