Selective Query Processing: A Risk-Sensitive Selection of Search Configurations

Author:

Mothe Josiane1ORCID,Ullah Md. Zia2ORCID

Affiliation:

1. Université de Toulouse, UT2J, INSPE, France

2. Centre National de la Recherche Scientifique (CNRS) France, and Edinburgh Napier University, UK

Abstract

In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches, and these optimized parameters are then used as the search configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible search configurations. This approach was extended recently to include many other parameters, leading to many possible search configurations where the system automatically selects the best configuration on a per-query basis. One problem with this approach is the system training, which requires evaluation of each training query with every possible configuration. In real-world systems, so many parameters and possible values must be evaluated that this approach is impractical, especially when the system must be updated frequently, as is the case for commercial search engines. In general, the more configurations, the greater the effectiveness when configuration selection is appropriate but also the greater the risk of decreasing effectiveness in the case of an inappropriate configuration selection. To determine the ideal configurations to be used for each query in real-world systems, we have developed a method in which a limited number of possible configurations are pre-selected, then used in a meta-search engine that decides the best search configuration for each query. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward tradeoff between the number of configurations kept and system effectiveness. We define two alternative risk functions to apply to different goals. For final configuration selection, the decision is based on query feature similarities. We compare two alternative risk functions on two query types (ad hoc and diversity) and compare these to more sophisticated machine learning based methods. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to obtain results close to the best achievable results for each query. Effectiveness is increased by about 15% according to the P@10 and nDCG@10 evaluation metrics when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.

Funder

European Union’s Horizon Europe research and innovation programme

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The Surprising Effectiveness of Rankers trained on Expanded Queries;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

2. Shaping the Future of Endangered and Low-Resource Languages---Our Role in the Age of LLMs: A Keynote at ECIR 2024;ACM SIGIR Forum;2024-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3