Affiliation:
1. SAP France, France
2. LIP6–Sorbonne Université, CNRS, France
3. SAP France, LIP6–Sorbonne Université, CNRS, France
Abstract
We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and Business Intelligence (BI) tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, union, difference, and pivot. We introduce
aggregable properties
to describe for any attribute of an analytic table, which aggregation functions correctly aggregate the attribute along which sets of dimension attributes. These properties can also be used to formally identify attributes that are
summarizable
with respect to some aggregation function along a given set of dimension attributes. This is particularly helpful to detect incorrect aggregations of measures obtained through the use of non-distributive aggregation functions like average and count. We extend the notion of summarizability by introducing a new
generalized summarizability condition
to control the aggregation of attributes after any analytic operation. Finally, we define
propagation rules
that transform aggregable properties of the query input tables into new aggregable properties for the result tables, preserving summarizability and generalized summarizability.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Reference58 articles.
1. Answering queries using views;Afrati Foto;Synth. Lect. Data Manage.,2019
2. Foto Afrati and Rada Chirkova. 2011. Selecting and using views to compute aggregate queries. J. Comput. Syst. Sci . 77 5 (2011) 1079–1107.
3. Sina Ariyan and Leopoldo Bertossi. 2013. A multidimensional data model with subcategories for flexibly capturing summarizability. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM’13). Association for Computing Machinery, New York, NY. 10.1145/2484838.2484857
4. Azure 2020. Azure Blob storage: Massively scalable and secure object storage for cloud-native workloads archives data lakes high-performance computing and machine learning Software. (2020). https://azure.microsoft.com/enus/services/storage/blobs/.
5. Functional dependencies with null markers;Badia Antonio;Comput. J.,2014