High-Throughput Vector Similarity Search in Knowledge Graphs

Author:

Mohoney Jason1ORCID,Pacaci Anil2ORCID,Chowdhury Shihabur Rahman2ORCID,Mousavi Ali2ORCID,Ilyas Ihab F.2ORCID,Minhas Umar Farooq2ORCID,Pound Jeffrey2ORCID,Rekatsinas Theodoros2ORCID

Affiliation:

1. University of Wisconsin-Madison, Madison, WI, USA

2. Apple, Seattle, WA, USA

Abstract

There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31× improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.

Publisher

Association for Computing Machinery (ACM)

Reference48 articles.

1. 2023. LLM Powered Search - Vectara. https://vectara.com Accessed on February 28, 2023 . 2023. LLM Powered Search - Vectara. https://vectara.com Accessed on February 28, 2023.

2. 2023. Vector Database for Vector Search | Pinecone. https://www.pinecone.io Accessed on February 28, 2023 . 2023. Vector Database for Vector Search | Pinecone. https://www.pinecone.io Accessed on February 28, 2023.

3. 2023. Vespa - the big data serving engine. https://vespa.ai Accessed on February 28, 2023 . 2023. Vespa - the big data serving engine. https://vespa.ai Accessed on February 28, 2023.

4. Optimal column layout for hybrid workloads

5. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search;Proceedings of the ACM on Management of Data;2024-05-29

2. Applications and Challenges for Large Language Models: From Data Management Perspective;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. Reinforcement Learning Infused MAC for Adaptive Connectivity;2024 IEEE Wireless Communications and Networking Conference (WCNC);2024-04-21

4. Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach;Lecture Notes in Computer Science;2024

5. Optimizing Resource Utilization Using Vector Databases in Green Internet of Things;2023 IEEE Globecom Workshops (GC Wkshps);2023-12-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3