Affiliation:
1. University of Athens, Athens, Greece
2. Intel Research, Berkeley, CA
3. University of Maryland, College Park, MD
Abstract
Several studies have demonstrated the effectiveness of the Haar wavelet decomposition as a tool for reducing large amounts of data down to compact
wavelet synopses
that can be used to obtain fast, accurate approximate answers to user queries. Although originally designed for minimizing the overall mean-squared (i.e.,
L
2
-norm) error in the data approximation, recently proposed methods also enable the use of Haar wavelets in minimizing other error metrics, such as the relative error in data value reconstruction, which is arguably the most important for approximate query answers. Relatively little attention, however, has been paid to the problem of using wavelet synopses as an approximate query answering tool over complex tabular datasets containing
multiple measures
, such as those typically found in real-life OLAP applications. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. As we demonstrate in this article, these existing
individual
or
combined
storage approaches for the wavelet coefficients of different measures can easily lead to suboptimal storage utilization, resulting in drastically reduced accuracy for approximate query answers. To address this problem, in this work, we introduce the notion of an
extended
wavelet coefficient as a flexible, efficient storage method for wavelet coefficients over multimeasure data. We also propose novel algorithms for constructing effective (optimal or near-optimal) extended wavelet-coefficient synopses under a given storage constraint, for both sum-squared error and relative-error norms. Experimental results with both real-life and synthetic datasets validate our approach, demonstrating that our techniques consistently obtain significant gains in approximation accuracy compared to existing solutions.
Publisher
Association for Computing Machinery (ACM)
Reference26 articles.
1. Join synopses for approximate query answering
2. Cisco. 1999. Netflow services and applications. White paper. http://www.cisco.com/. Cisco. 1999. Netflow services and applications. White paper. http://www.cisco.com/.
3. Cormen T. Leiserson C. and Rivest R. 1990. Introduction to Algorithms. MIT Press Cambridge MA. Cormen T. Leiserson C. and Rivest R. 1990. Introduction to Algorithms. MIT Press Cambridge MA.
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献