Affiliation:
1. Intel Research Berkeley, Berkeley, CA
2. Indian Institute of Technology, New Delhi, India
Abstract
Several studies have demonstrated the effectiveness of the wavelet decomposition as a tool for reducing large amounts of data down to compact
wavelet synopses
that can be used to obtain fast, accurate approximate query answers. Conventional wavelet synopses that greedily minimize the overall root-mean-squared (i.e.,
L
2
-norm) error in the data approximation can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of nontrivial guarantees for individual approximate answers. Thus,
probabilistic thresholding schemes
have been recently proposed as a means of building wavelet synopses that try to
probabilistically
control
maximum
approximation-error metrics (e.g., maximum relative error).A key open problem is whether it is possible to design efficient
deterministic
wavelet-thresholding algorithms for minimizing
general, non-L
2
error metrics
that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better maximum-error wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., “bad” coin-flip sequences) leading to poor solutions; in addition, they can be used to directly optimize the synopsis construction process for other useful error metrics, such as the
mean relative error
in data-value reconstruction. In this article, we propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing
general approximation-error metrics
. We first consider the problem of constructing wavelet synopses optimized for
maximum error
, and introduce an
optimal
low polynomial-time algorithm for
one-dimensional
wavelet thresholding---our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to
multidimensional
wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time
approximation schemes
(with tunable approximation guarantees) for deterministic wavelet thresholding in multiple dimensions. We then demonstrate how our optimal and approximate thresholding algorithms for maximum error can be extended to handle a broad, natural class of
distributive error metrics
, which includes several important error measures, such as mean weighted relative error and weighted
L
p
-norm error. Experimental results on real-world and synthetic data sets evaluate our novel optimization algorithms, and demonstrate their effectiveness against earlier wavelet-thresholding schemes.
Publisher
Association for Computing Machinery (ACM)
Cited by
49 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献