Affiliation:
1. École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, VD, Switzerland
2. University of Washington, Seattle, WA, USA
Abstract
We investigate an approximation algorithm for various aggregate queries on partially materialized data cubes. Data cubes are interpreted as probability distributions, and cuboids from a partial materialization populate the terms of a series expansion of the target query distribution. Unknown terms in the expansion are just assumed to be 0 in order to recover an approximate query result. We identify this method as a variant of related approaches from other fields of science, that is, the Bahadur representation and, more generally, (biased) Fourier expansions of Boolean functions. Existing literature indicates a rich but intricate theoretical landscape. Focusing on the data cube application, we start by investigating worst-case error bounds. We build upon prior work to obtain provably optimal materialization strategies with respect to query workloads. In addition, we propose a new heuristic method governing materialization decisions. Finally, we show that well-approximated queries are guaranteed to have well-approximated roll-ups.
Publisher
Association for Computing Machinery (ACM)
Reference45 articles.
1. Marc. Aerts, Helena Geys, Geert Molenberghs, and Louise M. Ryan. 2002. Topics in Modelling of Clustered Data. Chapman & Hall/CRC, New York, NY, USA.
2. BlinkDB
3. The high-order Boltzmann machine: learned distribution and topology