Affiliation:
1. Technical University of Denmark, Denmark
2. University of Technology Sydney, Australia
3. University College of Northern Denmark, Denmark
Abstract
Data warehousing populates data from different source systems into a central data warehouse (DW) through extraction, transformation, and loading (ETL). Massive transaction data are routinely recorded in a variety of applications such as retail commerce, bank systems, and website management. Transaction data record the timestamp and relevant reference data needed for a particular transaction record. It is a non-trivial task for a standard ETL to process transaction data with dependencies and high velocity. This chapter presents a two-tiered segmentation approach for transaction data warehousing. The approach uses a so-called two-staging ETL method to process detailed records from operational systems, followed by a dimensional data process to populate the data store with a star or snowflake schema. The proposed approach is an all-in-one solution capable of processing fast/slowly changing data and early/late-arriving data. This chapter evaluates the proposed method, and the results have validated the effectiveness of the proposed approach for processing transaction data.
Reference26 articles.
1. Optimized incremental ETL jobs for maintaining data warehouses
2. Bliujute, R., Saltenis, S., Slivinskas, G., & Jensen, C. S. (1998). Systematic Change Management in Dimensional Data Warehousing. Time Center Technical Report TR-23.
3. Enhancing Traditional Data Warehousing Architectures with Real-Time Capabilities
4. MapReduce