Affiliation:
1. Chungbuk National University
Abstract
The measured data may contain various types of attributes such as continuous, categorical, and set-valued attributes. Several locality-sensitive hashing techniques, which enable to find similar pairs of data in a fast and approximate way, have been developed for data with either numeric or set-valued attributes. This paper introduces a new locality sensitive-hashing technique applicable to data with categorical attributes.
Publisher
Trans Tech Publications, Ltd.
Reference14 articles.
1. A. Rajaraman and J. D. Ullman: Mining of Massive Datasets, Cambridge University Press (2012).
2. S. Boriah, V. Chandola, V. Kumar: Similarity Measures for Categorical Data: A Comparative Evaluation, Proc. of the 8th SIAM Int. Conf. on Data Mining (2008) 243-254.
3. U. Manber: Finding similar files in a large file system, Proc. USENIX Conference (1994) 1–10.
4. A. Z. Broder: On the resemblance and containment of documents, Proc. Compression and Complexity of Sequence (1997) 21–29.
5. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher: Min-wise independent permutations, ACM Symposium on Theory of Computing (1998) 327–336.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献