How Am I Doing?: Evaluating Conversational Search Systems Offline-Reference-Cited by-同舟云学术

How Am I Doing?: Evaluating Conversational Search Systems Offline

Published:2021-10-31 Issue:4 Volume:39 Page:1-22
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Lipani Aldo¹,Carterette Ben²,Yilmaz Emine³

Affiliation:

1. University College London, London, United Kingdom

2. Spotify, New York, United States

3. University College London & Amazon, London, United Kingdom

Abstract

As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.

Funder

EPSRC Fellowship titled “Task Based Information Retrieval”

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3451160

Reference44 articles.

1. The relationship between IR effectiveness measures and user satisfaction

2. AlexaPrize. 2020. The Alexa Prize the socialbot challenge. https://developer.amazon.com/alexaprize/ AlexaPrize. 2020. The Alexa Prize the socialbot challenge. https://developer.amazon.com/alexaprize/

3. Guidelines for Human-AI Interaction

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access;Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval;2024-08-02

2. An Analysis of Stopping Strategies in Conversational Search Systems;Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval;2024-08-02

3. Analysing Utterances in LLM-Based User Simulation for Conversational Search;ACM Transactions on Intelligent Systems and Technology;2024-05-17

4. Tutorial on User Simulation for Evaluating Information Access Systems on the Web;Companion Proceedings of the ACM Web Conference 2024;2024-05-13

5. Augmenting Keyword-based Search in Mobile Applications Using LLMs;Proceedings of the 17th ACM International Conference on Web Search and Data Mining;2024-03-04