Affiliation:
1. Technische Universität Berlin
2. Snowflake Computing
3. Technische Universität Berlin and German Research Center for Artificial Intelligence (DFKI)
Abstract
Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operations.
Kernel Density Estimation (KDE) is a statistical method to estimate multivariate probability distributions from a data sample. Previously, we introduced a modern, self-tuning selectivity estimator for range scans based on KDE that out-performs state-of-the-art multidimensional histograms and is efficient to evaluate on graphics cards. In this paper, we extend these bandwidth-optimized KDE models to estimate the result size of single and multiple joins. In particular, we propose two approaches: (1) Building a KDE model from a sample drawn from the join result. (2) Efficiently combining the information from base table KDE models.
We evaluated our KDE-based join estimators on a variety of synthetic and real-world datasets, demonstrating that they are superior to state-of-the art join estimators based on sketching or sampling.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Learned Query Optimizer: What is New and What is Next;Companion of the 2024 International Conference on Management of Data;2024-06-09
2. QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries;Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications;2024-06-09
3. Automating localized learning for cardinality estimation based on XGBoost;Knowledge and Information Systems;2024-06-01
4. Towards Exploratory Query Optimization for Template-Based SQL Workloads;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
5. Sub-optimal Join Order Identification with L1-error;Proceedings of the ACM on Management of Data;2024-03-12