Affiliation:
1. The University of Edinburgh United Kingdom
2. Nagoya University Japan
Abstract
The pigeonhole principle states that if
n
items are contained in
m
boxes, then at least one box has no more than
n/m
items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions.
To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the new principle. This suggests that all the pigeonhole principle-based solutions are possible to be accelerated by the new principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the new principle. Besides, we discuss how to quickly find candidates specified by the new principle. The implementation requires only minor modifications on top of existing pigeonhole principle-based algorithms. Experimental results on real datasets demonstrate the applicability of the new principle as well as the superior performance of the algorithms based on the new principle.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Two-Level Signature Scheme for Stable Set Similarity Joins;Proceedings of the VLDB Endowment;2023-07
2. Improving Hamming-Distance Computation for Adaptive Similarity Search Approach;International Journal of Intelligent Information Technologies;2022-04-08
3. Learned Cardinality Estimation for Similarity Queries;Proceedings of the 2021 International Conference on Management of Data;2021-06-09
4. Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts;Proceedings of the 2021 International Conference on Management of Data;2021-06-09
5. Finding a Summary for All Maximal Cliques;2021 IEEE 37th International Conference on Data Engineering (ICDE);2021-04