Affiliation:
1. Politecnico di Milano
2. University of Maryland
Abstract
The problem of selectivity estimation for queries of nontraditional databases is still an open issue. In this article, we examine the problem of selectivity estimation for some types of spatial queries in databases containing
real data
. We have shown earlier [Faloutsos and Kamel 1994] that real point sets typically have a
nonuniform
distribution, violating consistently the uniformity and independence assumptions. Moreover, we demonstrated that the theory of fractals can help to describe real point sets. In this article we show how the concept of fractal dimension, i.e., (noninteger) dimension, can lead to the solution for the selectivity estimation problem in spatial databases. Among the infinite family of fractal dimensions, we consider here the Hausdorff fractal dimension
D
0
and the “Correlation” fractal dimension
D
2
. Specifically, we show that (a) the average number of neighbors for a given point set follows a power law, with
D
2
as exponent, and (b) the average number of nonempty range queries follows a power law with
E − D
0
as exponent (
E
is the dimension of the embedding space). We present the formulas to estimate the selectivity for “biased” range queries, for self-spatial joins, and for the average number of nonempty range queries. The result of some experiments on real and synthetic point sets are shown. Our formulas achieve very low relative errors, typically about 10%, versus 40%–100% of the formulas that are based on the uniformity and independence assumptions.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference34 articles.
1. Mining association rules between sets of items in large databases
2. Qbism: A prototype 3-d medial imaging database system;ARYA M.;IEEE Data Eng. Tech. Bull.,1993
3. A better way to compress images;BARNSLEY M. F.;BYTE,1988
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Generic Machine Learning Model for Spatial Query Optimization based on Spatial Embeddings;ACM Transactions on Spatial Algorithms and Systems;2024-04-13
2. A learning-based framework for spatial join processing: estimation, optimization and tuning;The VLDB Journal;2024-02-13
3. Identification of 4FGL Uncertain Sources at Higher Resolutions with Inverse Discrete Wavelet Transform;The Astrophysical Journal;2024-01-01
4. Spatial embedding;Proceedings of the 30th International Conference on Advances in Geographic Information Systems;2022-11
5. A Learned Query Optimizer for Spatial Join;Proceedings of the 29th International Conference on Advances in Geographic Information Systems;2021-11-02