Affiliation:
1. Sun Yat-sen University, Guangzhou, China
2. South China University of Technology, Guangzhou, China
3. Hong Kong University of Science and Technology, Hong Kong, China
Abstract
Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the
c
-Approximate Nearest Neighbor (
c
-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a
query-oblivious
manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of
query-oblivious
bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer
c
≥ 2.
In this paper, we introduce a novel concept of
query-aware
bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a
query-aware
LSH function is a random projection coupled with
query-aware
bucket partition, which removes random shift required by traditional
query-oblivious
LSH functions. Notably,
query-aware
bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel
query-aware
LSH scheme named QALSH for
c
-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of
query-aware
LSH function enables QALSH to work with any approximation ratio
c
> 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio
c
< 2, QALSH can achieve much better query quality.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
73 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献