Affiliation:
1. Université de Toulouse, UT2J, INSPE, France
2. Centre National de la Recherche Scientifique (CNRS) France, and Edinburgh Napier University, UK
Abstract
In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches, and these optimized parameters are then used as the search configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible search configurations. This approach was extended recently to include many other parameters, leading to many possible search configurations where the system automatically selects the best configuration on a per-query basis. One problem with this approach is the system training, which requires evaluation of each training query with every possible configuration. In real-world systems, so many parameters and possible values must be evaluated that this approach is impractical, especially when the system must be updated frequently, as is the case for commercial search engines. In general, the more configurations, the greater the effectiveness when configuration selection is appropriate but also the greater the risk of decreasing effectiveness in the case of an inappropriate configuration selection. To determine the ideal configurations to be used for each query in real-world systems, we have developed a method in which a limited number of possible configurations are pre-selected, then used in a meta-search engine that decides the best search configuration for each query. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward tradeoff between the number of configurations kept and system effectiveness. We define two alternative risk functions to apply to different goals. For final configuration selection, the decision is based on query feature similarities. We compare two alternative risk functions on two query types (ad hoc and diversity) and compare these to more sophisticated machine learning based methods. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to obtain results close to the best achievable results for each query. Effectiveness is increased by about 15% according to the P@10 and nDCG@10 evaluation metrics when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.
Funder
European Union’s Horizon Europe research and innovation programme
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献