Neighbor-sensitive hashing-Reference-Cited by-同舟云学术

Neighbor-sensitive hashing

Published:2015-11 Issue:3 Volume:9 Page:144-155
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Park Yongjoo¹,Cafarella Michael¹,Mozafari Barzan¹

Affiliation:

1. University of Michigan, Ann Arbor, MI

Abstract

Approximate k NN ( k -nearest neighbor) techniques using binary hash functions are among the most commonly used approaches for overcoming the prohibitive cost of performing exact k NN queries. However, the success of these techniques largely depends on their hash functions' ability to distinguish k NN items; that is, the k NN items retrieved based on data items' hashcodes , should include as many true k NN items as possible. A widely-adopted principle for this process is to ensure that similar items are assigned to the same hashcode so that the items with the hashcodes similar to a query's hashcode are likely to be true neighbors. In this work, we abandon this heavily-utilized principle and pursue the opposite direction for generating more effective hash functions for k NN tasks. That is, we aim to increase the distance between similar items in the hashcode space, instead of reducing it. Our contribution begins by providing theoretical analysis on why this revolutionary and seemingly counter-intuitive approach leads to a more accurate identification of k NN items. Our analysis is followed by a proposal for a hashing algorithm that embeds this novel principle. Our empirical studies confirm that a hashing algorithm based on this counter-intuitive idea significantly improves the efficiency and accuracy of state-of-the-art techniques.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2850583.2850589

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data;Proceedings of the ACM on Management of Data;2024-05-29

2. CLIMBER: Pivot-Based Approximate Similarity Search Over Big Data Series;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. InferDB: In-Database Machine Learning Inference Using Indexes;Proceedings of the VLDB Endowment;2024-04

4. A Step Toward Deep Online Aggregation;Proceedings of the ACM on Management of Data;2023-06-13

5. Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters;Proceedings of the ACM Web Conference 2023;2023-04-30