Affiliation:
1. Stanford University, Stanford, USA
2. DBOS, Inc., Mountain View, USA
3. UC Berkeley, Berkeley, USA
Abstract
Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2--1,000× higher throughput at a fixed recall. Our code is available at: https://github.com/stanford-futuredata/ACORN.
Publisher
Association for Computing Machinery (ACM)
Reference69 articles.
1. [n. d.]. Filtered Vector Search | Weaviate - vector database. https://weaviate.io/developers/weaviate/concepts/ prefiltering
2. [n. d.]. Pre-label and enrich data with bulk classifications. https://labelbox.ghost.io/blog/pre-label-and-enrich-yourdata-with-bulk-classifications/
3. [n. d.]. Q&A over Documents - LlamaIndex 0.8.43. https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/ question_and_answer.html
4. 2023. Building Chat LangChain. https://blog.langchain.dev/building-chat-langchain-2/
5. 2023. DiskANN. https://github.com/microsoft/DiskANN original-date: 2020-06--18T06:18:06Z.