How Am I Doing?: Evaluating Conversational Search Systems Offline

Author:

Lipani Aldo1,Carterette Ben2,Yilmaz Emine3

Affiliation:

1. University College London, London, United Kingdom

2. Spotify, New York, United States

3. University College London & Amazon, London, United Kingdom

Abstract

As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.

Funder

EPSRC Fellowship titled “Task Based Information Retrieval”

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Reference44 articles.

1. The relationship between IR effectiveness measures and user satisfaction

2. AlexaPrize. 2020. The Alexa Prize the socialbot challenge. https://developer.amazon.com/alexaprize/ AlexaPrize. 2020. The Alexa Prize the socialbot challenge. https://developer.amazon.com/alexaprize/

3. Guidelines for Human-AI Interaction

Cited by 33 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access;Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval;2024-08-02

2. An Analysis of Stopping Strategies in Conversational Search Systems;Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval;2024-08-02

3. Analysing Utterances in LLM-Based User Simulation for Conversational Search;ACM Transactions on Intelligent Systems and Technology;2024-05-17

4. Tutorial on User Simulation for Evaluating Information Access Systems on the Web;Companion Proceedings of the ACM Web Conference 2024;2024-05-13

5. Augmenting Keyword-based Search in Mobile Applications Using LLMs;Proceedings of the 17th ACM International Conference on Web Search and Data Mining;2024-03-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3