Affiliation:
1. University of Wisconsin-Madison
2. IBM Almaden Research Center
Abstract
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Cited by
91 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. HomeRun: A Cardinality Estimation Advisor for Graph Databases;Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA);2024-06-09
2. A Generic Machine Learning Model for Spatial Query Optimization based on Spatial Embeddings;ACM Transactions on Spatial Algorithms and Systems;2024-04-13
3. Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes;Proceedings of the ACM on Management of Data;2023-12-08
4. On distributed data aggregation and the precision of approximate histograms;Journal of Parallel and Distributed Computing;2023-10
5. An error-bounded median filter for correcting ECG baseline wander;Health Information Science and Systems;2023-09-26