Affiliation:
1. University of Maryland at College Park
Abstract
The rise of social media and other forms of user-generated content have created the demand for real-time search: against a high-velocity stream of incoming documents, users desire a list of relevant results at the time the query is issued. In the context of real-time search on tweets, this work explores candidate generation in a two-stage retrieval architecture where an initial list of results is processed by a second-stage rescorer to produce the final output. We introduce Bloom filter chains, a novel extension of Bloom filters that can dynamically expand to efficiently represent an arbitrarily long and growing list of monotonically-increasing integers with a constant false positive rate. Using a collection of Bloom filter chains, a novel approximate candidate generation algorithm called BWand is able to perform both conjunctive and disjunctive retrieval. Experiments show that our algorithm is many times faster than competitive baselines and that this increased performance does not require sacrificing end-to-end effectiveness. Our results empirically characterize the trade-off space defined by output quality, query evaluation speed, and memory footprint for this particular search architecture.
Funder
Division of Information and Intelligent Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Approximate Maximum Inner Product Search Over Sparse Vectors;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
2. An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors;ACM Transactions on Information Systems;2023-11-08
3. ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information Retrieval;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18
4. ReNeuIR: Reaching Efficiency in Neural Information Retrieval;Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval;2022-07-06
5. Exploiting Intel optane persistent memory for full text search;Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management;2021-06-22