Affiliation:
1. Chungbuk National University
Abstract
It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.
Publisher
Trans Tech Publications, Ltd.
Reference13 articles.
1. A. Rajaraman and J. D. Ullman: Mining of Massive Datasets, Cambridge University Press (2012).
2. U. Manber: Finding similar files in a large file system, Proc. USENIX Conference (1994) 1–10.
3. A. Z. Broder: On the resemblance and containment of documents, Proc. Compression and Complexity of Sequence (1997) 21–29.
4. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher: Min-wise independent permutations, ACM Symposium on Theory of Computing (1998) 327–336.
5. A. Andoni and P. Indyk: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Comm. ACM, 51(1) (2008) 117–122.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献