LiteHST: A Tree Embedding based Method for Similarity Search

Author:

Zeng Yuxiang1ORCID,Tong Yongxin2ORCID,Chen Lei3ORCID

Affiliation:

1. Beihang University & The Hong Kong University of Science and Technology, Beijing, Hong Kong SAR, China

2. Beihang University, Beijing, China

3. The Hong Kong University of Science and Technology, Hong Kong SAR, China

Abstract

Similarity search is getting increasingly useful in real applications. This paper focuses on the in-memory similarity search, i.e., the range query and k nearest neighbor (kNN) query, under arbitrary metric spaces, where the only known information is the distance function to measure the similarity between two objects. Although lots of research has studied this problem, the query efficiency of existing solutions is still unsatisfactory. To further improve the query efficiency, we are inspired by the tree embeddings, which map each object into a unique leaf of a well-structured tree solely based on the distances. Unlike existing embedding techniques (e.g., Lipschitz embeddings and pivot mapping) for similarity search, where an extra multi-dimensional index is needed to index the embedding space (e.g., Lp metrics), we directly use this tree to answer similarity search. This seems to be promising, but it is challenging to tailor tree embeddings for efficient similarity search. Specifically, we present a novel index called LiteHST, which is based on the most popular tree embedding (HST) and heavily customized for similarity search in the node structure and storage scheme. We propose a new construction algorithm with lower time complexity than existing methods and prove the optimality of LiteHST in the distance bound. Based on this new index, we also design optimization techniques that heavily reduce the number of distance computations and hence save running time. Finally, extensive experiments demonstrate that our solution outperforms the state-of-the-art in the query efficiency by a large margin.

Funder

the Hong Kong RGC RIF Project

China NSFC

National Science Foundation of China (NSFC) under Grant

the Hong Kong RGC GRF Project

Hong Kong ITC ITF grants

the Hong Kong RGC CRF Project

HKUST Global Strategic Partnership Fund

the Beihang University Basic Research Funding

Guangdong Basic and Applied Basic Research Foundation

the National Science Foundation of China (NSFC) under Grant

WeBank Scholars Program

National Key Research and Development Program of China Grant

the Hong Kong RGC AOE Project

icrosoft Research Asia Collaborative Research Grant

HKUST-Webank joint research lab grant

the Hong Kong RGC Theme-based project

the Funding

Publisher

Association for Computing Machinery (ACM)

Reference83 articles.

1. 2021. List of English words. https://github.com/dwyl/english-words/ 2021. List of English words. https://github.com/dwyl/english-words/

2. 2021. Scikit-learn. https://scikit-learn.org/stable/ 2021. Scikit-learn. https://scikit-learn.org/stable/

3. 2022. The in-memory M-tree. https://github.com/erdavila/M-Tree 2022. The in-memory M-tree. https://github.com/erdavila/M-Tree

4. 2022. The SIFT dataset. http://corpus-texmex.irisa.fr/ 2022. The SIFT dataset. http://corpus-texmex.irisa.fr/

5. Ittai Abraham Yair Bartal and Ofer Neiman. 2006. Advances in metric embedding theory. In STOC. 271--286. Ittai Abraham Yair Bartal and Ofer Neiman. 2006. Advances in metric embedding theory. In STOC. 271--286.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Dimensionality Reduction for Partial Label Learning: A Unified and Adaptive Approach;IEEE Transactions on Knowledge and Data Engineering;2024-08

2. GTS: GPU-based Tree Index for Fast Similarity Search;Proceedings of the ACM on Management of Data;2024-05-29

3. HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Graph-decomposed k-NN searching algorithm on road network;Frontiers of Computer Science;2024-02-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3