Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods-Reference-Cited by-同舟云学术

Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

Published:2013-11 Issue:4 Volume:31 Page:1-43
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Hofmann Katja¹,Whiteson Shimon¹,Rijke Maarten De¹

Affiliation:

1. University of Amsterdam

Abstract

Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.

Funder

European Union's ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme

Center for Creation, Content and Technology

CLARIN-nl program

CIP ICT-PSP

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

European Social Fund

Dutch national program COMMIT

Seventh Framework Programme

Royal Netherlands Academy of Arts and Sciences

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/2536736.2536737

Reference49 articles.

1. Improving web search ranking by incorporating user behavior information

2. Carterette B. and Jones R. 2008. Evaluating search engines by modeling the relationship between relevance and clicks. In Advances in Neural Information Processing Systems 20 (NIPS’07). J. Platt D. Koller Y. Singer and S. Roweis Eds. MIT Press Cambridge MA 217--224. Carterette B. and Jones R. 2008. Evaluating search engines by modeling the relationship between relevance and clicks. In Advances in Neural Information Processing Systems 20 (NIPS’07) . J. Platt D. Koller Y. Singer and S. Roweis Eds. MIT Press Cambridge MA 217--224.

3. A dynamic bayesian network click model for web search ranking

4. Expected reciprocal rank for graded relevance

5. Large-scale validation and analysis of interleaved search evaluation

Cited by 26 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Validating Synthetic Usage Data in Living Lab Environments;Journal of Data and Information Quality;2023-09-24

2. Interleaved Online Testing in Large-Scale Systems;Companion Proceedings of the ACM Web Conference 2023;2023-04-30

3. Theoretical Analysis on the Efficiency of Interleaved Comparisons;Lecture Notes in Computer Science;2023

4. Debiased Balanced Interleaving at Amazon Search;Proceedings of the 31st ACM International Conference on Information & Knowledge Management;2022-10-17

5. MergeDTS;ACM Transactions on Information Systems;2020-10-31