Affiliation:
1. TU Berlin
2. SAP
3. Google
4. Nga Tran, InfluxData
5. Snowflake
6. Hasso Plattner Institute, University of Potsdam
7. INRIA, Ecole Polytechnique
Abstract
Join ordering and query optimization are crucial for query performance but remain challenging due to unknown or changing characteristics of query intermediates, especially for complex queries with many joins. Over the past two decades, a spectrum of techniques for adaptive query processing (AQP)---including inter-/intra-operator adaptivity and tuple routing---have been proposed to address these challenges. However, commercial database systems in practice do not implement holistic AQP techniques because they increase the system complexity (e.g., intertwined planning and execution) and thus, complicate debugging and testing. Additionally, existing approaches may incur large overheads, leading to problematic performance regressions. In this paper, we introduce POLAR, a simple yet very effective technique for a self-regulating selection of alternative join orderings with bounded overhead. We enhance left-deep join pipelines with alternative join orders, perform regret-bounded tuple routing to find and validate "plans of least resistance", and then process the majority of tuple batches through these plans. We study different join order selection techniques, different routing strategies, and a variety of workload characteristics. Our experiments with a POLAR prototype in DuckDB show runtime improvements of up to 9x and less than 7% overhead for all benchmark queries, while outperforming state-of-the-art AQP systems by up to 15x.
Publisher
Association for Computing Machinery (ACM)
Reference102 articles.
1. Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Çetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stanley B. Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In CIDR. 277--289. http://cidrdb.org/cidr2005/papers/P23.pdf
2. Aurora: a new model and architecture for data stream management
3. On the stability of plan costs and the costs of plan stability
4. Ashraf Aboulnaga Peter J. Haas Sam Lightstone Guy M. Lohman Volker Markl Ivan Popivanov and Vijayshankar Raman. 2004. Automated Statistics Collection in DB2 UDB. In VLDB. 10.1016/B978-012088469-8.50100-5
5. The dataflow model