Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases-Reference-Cited by-同舟云学术

Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases

Published:2023-05-26 Issue:1 Volume:1 Page:1-27
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Peng Yun¹^ORCID,Choi Byron²^ORCID,Chan Tsz Nam²^ORCID,Yang Jianye³^ORCID,Xu Jianliang²^ORCID

Affiliation:

1. Guangzhou University & Hong Kong Baptist University, Guangzhou & Hong Kong, China

2. Hong Kong Baptist University, Hong Kong, China

3. Guangzhou University, Guangzhou, China

Abstract

Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional databases, which has numerous real-world applications, such as image retrieval, recommendation, entity resolution, and sequence matching. Proximity graph (PG) has been the state-of-the-art index for ANN search. However, the search on existing PGs either suffers from a high time complexity or has no performance guarantee on the search result. In this paper, we propose a novel τ-monotonic graph (τ- MG) to address the limitations. The novelty of τ-MG lies in a τ-monotonic property. Based on this property, we prove that if the distance between a query q and its nearest neighbor is less than a constant τ, the search on τ-MG guarantees to find the exact nearest neighbor of q and the time complexity of the search is smaller than all existing PG-based methods. For index construction efficiency, we propose an approximate variant of τ-MG, namely τ-monotonic neighborhood graph (τ- MNG), which only requires the neighborhood of each node to be τ-monotonic. We further propose an optimization to reduce the number of distance computations in search. Our extensive experiments show that our techniques outperform all existing methods on well-known real-world datasets.

Funder

Hong Kong RGC

NSF of Hunan Province

NSF of Guangdong Province

NSFC

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3588908

Reference60 articles.

1. Estimating Local Intrinsic Dimensionality

2. Scalability of the NV-tree: Three Experiments

3. Anon. 2010. Datasets for approximate nearest neighbor search. Retrieved May 2022 from http://corpus-texmex.irisa.fr/. Anon. 2010. Datasets for approximate nearest neighbor search. Retrieved May 2022 from http://corpus-texmex.irisa.fr/.

4. Anon. 2011. Million Song Dataset Benchmarks. Retrieved May 2020 from http://www.ifs.tuwien.ac.at/mir/msd/. Anon. 2011. Million Song Dataset Benchmarks. Retrieved May 2020 from http://www.ifs.tuwien.ac.at/mir/msd/.

5. Anon. unknown. Common Crawl. Retrieved April 2020 from http://commoncrawl.org/. Anon. unknown. Common Crawl. Retrieved April 2020 from http://commoncrawl.org/.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

2. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search;Proceedings of the VLDB Endowment;2024-07

3. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search;Proceedings of the ACM on Management of Data;2024-05-29

4. ChatGraph: Chat with Your Graphs;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

5. Efficient Reverse $k$ Approximate Nearest Neighbor Search Over High-Dimensional Vectors;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13