Bottom-up computation of sparse and Iceberg CUBE-Reference-Cited by-同舟云学术

Bottom-up computation of sparse and Iceberg CUBE

Published:1999-06 Issue:2 Volume:28 Page:359-370
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Beyer Kevin¹,Ramakrishnan Raghu¹

Affiliation:

1. Computer Sciences Department, University of Wisconsin, Madison

Abstract

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/304181.304214

Reference16 articles.

Cited by 164 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mining Interesting Aggregate Tuples;Lecture Notes in Networks and Systems;2024

2. Visual Analytics of Co-Occurrences to Discover Subspaces in Structured Data;ACM Transactions on Interactive Intelligent Systems;2023-06-19

3. Accelerating Columnar Storage Based on Asynchronous Skipping Strategy;Big Data Research;2023-02

4. Deterministic, Fast and Accurate Solution of the Heavy Hitters q-Tail Latencies Problem;IEEE Access;2022

5. 3iCubing: An Interval Inverted Index Approach to Data Cubes;IEEE Access;2022