Robust Query Driven Cardinality Estimation under Changing Workloads-Reference-Cited by-同舟云学术

Robust Query Driven Cardinality Estimation under Changing Workloads

Published:2023-02 Issue:6 Volume:16 Page:1520-1533
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Negi Parimarjan¹,Wu Ziniu¹,Kipf Andreas¹,Tatbul Nesime²,Marcus Ryan²,Madden Sam¹,Kraska Tim¹,Alizadeh Mohammad¹

Affiliation:

1. MIT CSAIL

2. MIT CSAIL, Intel Labs

Abstract

Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, and are easily adaptable for any kind of query. Unfortunately, such models can suffer unpredictably bad performance under workload drift, i.e., if the query pattern or data changes. This makes them unreliable and hard to deploy. We analyze the reasons why models become unpredictable due to workload drift, and introduce modifications to the query representation and neural network training techniques to make query-driven models robust to the effects of workload drift. First, we emulate workload drift in queries involving some unseen tables or columns by randomly masking out some table or column features during training. This forces the model to make predictions with missing query information, relying more on robust features based on up-to-date DBMS statistics that are useful even when query or data drift happens. Second, we introduce join bitmaps, which extends sampling-based features to be consistent across joins using ideas from sideways information passing. Finally, we show how both of these ideas can be adapted to handle data updates. We show significantly greater generalization than past works across different workloads and databases. For instance, a model trained with our techniques on a simple workload (JOBLight-train), with 40 k synthetically generated queries of at most 3 tables each, is able to generalize to the much more complex Join Order Benchmark, which include queries with up to 16 tables, and improve query runtimes by 2× over PostgreSQL. We show similar robustness results with data updates, and across other workloads. We discuss the situations where we expect, and see, improvements, as well as more challenging workload drift scenarios where these techniques do not improve much over PostgreSQL. However, even in the most challenging scenarios, our models never perform worse than PostgreSQL, while standard query driven models can get much worse than PostgreSQL.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3583140.3583164

Reference46 articles.

1. 2018. SQL Server's Join Cardinality Estimation. https://www.sqlshack.com/join-estimation-internals/ [Online;]. 2018. SQL Server's Join Cardinality Estimation. https://www.sqlshack.com/join-estimation-internals/ [Online;].

2. The Aqua approximate query answering system

3. Pierre Baldi and Peter J Sadowski . 2013. Understanding dropout. Advances in neural information processing systems 26 ( 2013 ). Pierre Baldi and Peter J Sadowski. 2013. Understanding dropout. Advances in neural information processing systems 26 (2013).

4. Pessimistic Cardinality Estimation

5. Asoke Datta , Yesdaulet Izenov , Brian Tsan , and Florin Rusu . 2021. Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm Without Cardinality Estimates. arXiv preprint arXiv:2111.00163 ( 2021 ). Asoke Datta, Yesdaulet Izenov, Brian Tsan, and Florin Rusu. 2021. Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm Without Cardinality Estimates. arXiv preprint arXiv:2111.00163 (2021).

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning complex predicates for cardinality estimation using recursive neural networks;Information Systems;2024-09

2. Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD;Proceedings of the VLDB Endowment;2024-07

3. The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-Actions;Proceedings of the VLDB Endowment;2024-07

4. Machine Learning for Databases: Foundations, Paradigms, and Open problems;Companion of the 2024 International Conference on Management of Data;2024-06-09

5. Learned Query Optimizer: What is New and What is Next;Companion of the 2024 International Conference on Management of Data;2024-06-09