Affiliation:
1. Beihang University & The Hong Kong University of Science and Technology, Beijing, Hong Kong SAR, China
2. Beihang University, Beijing, China
3. The Hong Kong University of Science and Technology, Hong Kong SAR, China
Abstract
Similarity search is getting increasingly useful in real applications. This paper focuses on the in-memory similarity search, i.e., the range query and k nearest neighbor (kNN) query, under arbitrary metric spaces, where the only known information is the distance function to measure the similarity between two objects. Although lots of research has studied this problem, the query efficiency of existing solutions is still unsatisfactory. To further improve the query efficiency, we are inspired by the tree embeddings, which map each object into a unique leaf of a well-structured tree solely based on the distances. Unlike existing embedding techniques (e.g., Lipschitz embeddings and pivot mapping) for similarity search, where an extra multi-dimensional index is needed to index the embedding space (e.g., Lp metrics), we directly use this tree to answer similarity search. This seems to be promising, but it is challenging to tailor tree embeddings for efficient similarity search. Specifically, we present a novel index called LiteHST, which is based on the most popular tree embedding (HST) and heavily customized for similarity search in the node structure and storage scheme. We propose a new construction algorithm with lower time complexity than existing methods and prove the optimality of LiteHST in the distance bound. Based on this new index, we also design optimization techniques that heavily reduce the number of distance computations and hence save running time. Finally, extensive experiments demonstrate that our solution outperforms the state-of-the-art in the query efficiency by a large margin.
Funder
the Hong Kong RGC RIF Project
China NSFC
National Science Foundation of China (NSFC) under Grant
the Hong Kong RGC GRF Project
Hong Kong ITC ITF grants
the Hong Kong RGC CRF Project
HKUST Global Strategic Partnership Fund
the Beihang University Basic Research Funding
Guangdong Basic and Applied Basic Research Foundation
the National Science Foundation of China (NSFC) under Grant
WeBank Scholars Program
National Key Research and Development Program of China Grant
the Hong Kong RGC AOE Project
icrosoft Research Asia Collaborative Research Grant
HKUST-Webank joint research lab grant
the Hong Kong RGC Theme-based project
the Funding
Publisher
Association for Computing Machinery (ACM)
Reference83 articles.
1. 2021. List of English words. https://github.com/dwyl/english-words/ 2021. List of English words. https://github.com/dwyl/english-words/
2. 2021. Scikit-learn. https://scikit-learn.org/stable/ 2021. Scikit-learn. https://scikit-learn.org/stable/
3. 2022. The in-memory M-tree. https://github.com/erdavila/M-Tree 2022. The in-memory M-tree. https://github.com/erdavila/M-Tree
4. 2022. The SIFT dataset. http://corpus-texmex.irisa.fr/ 2022. The SIFT dataset. http://corpus-texmex.irisa.fr/
5. Ittai Abraham Yair Bartal and Ofer Neiman. 2006. Advances in metric embedding theory. In STOC. 271--286. Ittai Abraham Yair Bartal and Ofer Neiman. 2006. Advances in metric embedding theory. In STOC. 271--286.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献