ELPIS: Graph-Based Similarity Search for Scalable Data Science-Reference-Cited by-同舟云学术

ELPIS: Graph-Based Similarity Search for Scalable Data Science

Published:2023-02 Issue:6 Volume:16 Page:1548-1559
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Azizi Ilias¹,Echihabi Karima²,Palpanas Themis³

Affiliation:

1. UM6P, Université Paris Cité

2. UM6P

3. Université Paris Cité & IUF

Abstract

The recent popularity of learned embeddings has fueled the growth of massive collections of high-dimensional (high-d) vectors that model complex data. Finding similar vectors in these collections is at the core of many important and practical data science applications. The data series community has developed tree-based similarity search techniques that outperform state-of-the-art methods on large collections of both data series and generic high-d vectors, on all scenarios except for no-guarantees ng -approximate search, where graph-based approaches designed by the high-d vector community achieve the best performance. However, building graph-based indexes is extremely expensive both in time and space. In this paper, we bring these two worlds together, study the corresponding solutions and their performance behavior, and propose ELPIS, a new strong baseline that takes advantage of the best features of both to achieve a superior performance in terms of indexing and ng-approximate search in-memory. ELPIS builds the index 3x-8x faster than competitors, using 40% less memory. It also achieves a high recall of 0.99, up to 2x faster than the state-of-the-art methods, and answers 1-NN queries up to one order of magnitude faster.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3583140.3583166

Reference128 articles.

1. Elpis Archive . http://www.mi.parisdescartes.fr/~themisp/elpis/ , 2022 . Elpis Archive. http://www.mi.parisdescartes.fr/~themisp/elpis/, 2022.

2. R. Agrawal , C. Faloutsos , and A. Swami . Efficient similarity search in sequence databases . pages 69 -- 84 , 1993 . R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. pages 69--84, 1993.

3. U. Alon , M. Zilberstein , O. Levy , and E. Yahav . Code2vec: Learning distributed representations of code. 3(POPL) , 2019 . U. Alon, M. Zilberstein, O. Levy, and E. Yahav. Code2vec: Learning distributed representations of code. 3(POPL), 2019.

4. HD-index: Pushing the Scalability-accuracy Boundary for Approximate kNN Search;Arora A.;High-dimensional Spaces. PVLDB,2018

5. M. Aumüller , E. Bernhardsson , and A. Faithfull . Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms . In International Conference on Similarity Search and Applications , pages 34 -- 49 . Springer , 2017 . M. Aumüller, E. Bernhardsson, and A. Faithfull. Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In International Conference on Similarity Search and Applications, pages 34--49. Springer, 2017.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HPS: A novel heuristic hierarchical pruning strategy for dynamic top-k trajectory similarity query;Information Processing & Management;2024-11

2. DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search;The VLDB Journal;2024-08-21

3. Survey of vector database management systems;The VLDB Journal;2024-07-15

4. Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent Queries;Proceedings of the VLDB Endowment;2024-07

5. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search;Proceedings of the VLDB Endowment;2024-05