Affiliation:
1. University of Pisa, Pisa, Italy
2. University of Glasgow, Glasgow, UK
Abstract
Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index.
Funder
Italian Ministry of Education and Research (MIUR) in the framework of the CrossLab project
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Geometric Framework for Query Performance Prediction in Conversational Search;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18
2. Anytime Ranking on Document-Ordered Indexes;ACM Transactions on Information Systems;2022-01-31
3. An NVM SSD-based High Performance Query Processing Framework for Search Engines;IEEE Transactions on Knowledge and Data Engineering;2022
4. Machine Translation of British and American Literature Based on Parallel Corpus;Application of Intelligent Systems in Multi-modal Information Analytics;2022
5. A DFT-Based Running Time Prediction Algorithm for Web Queries;Future Internet;2021-08-04