A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques-Reference-Cited by-同舟云学术

A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

Published:2020-12-04 Issue:23 Volume:10 Page:8674
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Tardío Roberto^ORCID,Maté Alejandro^ORCID,Trujillo Juan^ORCID

Abstract

In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/23/8674/pdf

Reference25 articles.

1. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling;Kimball,2013

2. Apache Kylin: Extreme Olap Engine for Big Datahttp://kylin.apache.org/

3. The vertica analytic database

4. Elascticsearch: A Distributed, Restful Search and Analytics Enginehttps://www.elastic.co/es/elasticsearch/

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Ocean-DC: An analysis ready data cube framework for environmental and climate change monitoring over the port areas;Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments;2024-06-26

2. MetaPortal: Business Intelligence and Machine Learning Approach for VR Data;2023 Innovations in Intelligent Systems and Applications Conference (ASYU);2023-10-11

3. Multidimensional Data Analysis of Ambient Air Quality Based on Apache Kylin;Journal of Physics: Conference Series;2023-07-01

4. Research on Big Data Ad Hoc Query Technology Based on an Accident Insurance Campaign;IEEE ICEIB 2023;2023-06-19

5. Research on the Learning Performance and Communication Networking of Online Analytical Processing Courses;IEEE ICEIB 2023;2023-06-19