A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
Author:
Qi Kaiyang1, Yu Jiong12, He Zhenzhen2
Affiliation:
1. School of Software, Xinjiang University, Urumqi 830091, China 2. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract
Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that Q-error cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: Q-error, MAE, and SMAPE.
Funder
National Natural Science Foundation of China Key R&D projects in Xinjiang Uygur Autonomous Region Natural Science Foundation of Xinjiang Uygur Autonomous Region of China
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference41 articles.
1. Negi, P., Marcus, R., Mao, H., Tatbul, N., Kraska, T., and Alizadeh, M. (2020, January 20–24). Cost-Guided Cardinality Estimation: Focus Where it Matters. Proceedings of the 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), Dallas, TX, USA. 2. Cardinality Estimation: An Experimental Survey;Harmouch;Proc. VLDB Endow.,2017 3. How Good Are Query Optimizers, Really?;Leis;Proc. VLDB Endow.,2015 4. Query optimization through the looking glass, and what we found running the Join Order Benchmark;Leis;VLDB J.,2018 5. Perron, M., Shang, Z., Kraska, T., and Stonebraker, M. (2019, January 8–11). How I Learned to Stop Worrying and Love Re-optimization. Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, China.
|
|