Selective Query Processing: A Risk-Sensitive Selection of Search Configurations-Reference-Cited by-同舟云学术

Selective Query Processing: A Risk-Sensitive Selection of Search Configurations

Published:2023-08-21 Issue:1 Volume:42 Page:1-35
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Mothe Josiane¹^ORCID,Ullah Md. Zia²^ORCID

Affiliation:

1. Université de Toulouse, UT2J, INSPE, France

2. Centre National de la Recherche Scientifique (CNRS) France, and Edinburgh Napier University, UK

Abstract

In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches, and these optimized parameters are then used as the search configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible search configurations. This approach was extended recently to include many other parameters, leading to many possible search configurations where the system automatically selects the best configuration on a per-query basis. One problem with this approach is the system training, which requires evaluation of each training query with every possible configuration. In real-world systems, so many parameters and possible values must be evaluated that this approach is impractical, especially when the system must be updated frequently, as is the case for commercial search engines. In general, the more configurations, the greater the effectiveness when configuration selection is appropriate but also the greater the risk of decreasing effectiveness in the case of an inappropriate configuration selection. To determine the ideal configurations to be used for each query in real-world systems, we have developed a method in which a limited number of possible configurations are pre-selected, then used in a meta-search engine that decides the best search configuration for each query. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward tradeoff between the number of configurations kept and system effectiveness. We define two alternative risk functions to apply to different goals. For final configuration selection, the decision is based on query feature similarities. We compare two alternative risk functions on two query types (ad hoc and diversity) and compare these to more sophisticated machine learning based methods. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to obtain results close to the best achievable results for each query. Effectiveness is increased by about 15% according to the P@10 and nDCG@10 evaluation metrics when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.

Funder

European Union’s Horizon Europe research and innovation programme

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3608474

Reference73 articles.

1. Taily

2. Giambattista Amati. 2003. Probability Models for Information Retrieval Based on Divergence from Randomness. Ph.D. dissertation. University of Glasgow.

3. Giambattista Amati, Claudio Carpineto, and Giovanni Romano. 2004. Query difficulty, robustness, and selective application of query expansion. In Advances in Information Retrieval, Sharon McDonald and John Tait (Eds.). Springer, Berlin, Germany, 127–137.

4. A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms

5. Efficiency trade-offs in two-tier web search systems

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Surprising Effectiveness of Rankers trained on Expanded Queries;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

2. Shaping the Future of Endangered and Low-Resource Languages---Our Role in the Age of LLMs: A Keynote at ECIR 2024;ACM SIGIR Forum;2024-06