Affiliation:
1. University of Glasgow, Scotland, United Kingdom
Abstract
Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches.
Publisher
Association for Computing Machinery (ACM)
Reference51 articles.
1. Milton Abramowitz. 1974. Handbook of Mathematical Functions: With Formulas Graphs and Mathematical Tables. Dover Publications Incorporated. Milton Abramowitz. 1974. Handbook of Mathematical Functions: With Formulas Graphs and Mathematical Tables. Dover Publications Incorporated.
2. The Stratosphere platform for big data analytics
3. AsterixDB
4. Learning Set Cardinality in Distance Nearest Neighbours
5. Learning to accurately COUNT with query-driven predictive analytics
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Learning From User-Specified Optimizer Hints in Database Systems;Foundations of Computing and Decision Sciences;2024-05-01
2. A new framework based on features modeling and ensemble learning to predict query performance;PLOS ONE;2021-10-18
3. Towards instance-optimized data systems;Proceedings of the VLDB Endowment;2021-07
4. XLJoins;Proceedings of the 2021 International Conference on Management of Data;2021-06-09
5. Consistent and Flexible Selectivity Estimation for High-Dimensional Data;Proceedings of the 2021 International Conference on Management of Data;2021-06-09