Author:
Gao Jintao,Li Zhanhuai,Liu Wenjie
Abstract
Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.
Reference24 articles.
1. On the propagation of errors in the size of join results
2. Adaptive selectivity estimation using query feedback
3. Stillger M, Lohman G M, Markl V, et al. LEO-DB2's Learning Optimizer[C]//Proceedidngs of the 27th International Conference on Very Large Data Bases, 2001: 19-28
[Article]
4. PostgreSQL. Postgresql 9. 6[EB/OL]. [2018-06-28]. https://www.postgresql.org/docs/9.6/static/monitoring.html
5. Oracle. Oracle DataBase SQL Tuning Guide 12c Release 1[EB/OL]. [2013-06-26] http://docs.oracle.com/database/121/index.htm