Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
-
Published:2013-11
Issue:4
Volume:31
Page:1-43
-
ISSN:1046-8188
-
Container-title:ACM Transactions on Information Systems
-
language:en
-
Short-container-title:ACM Trans. Inf. Syst.
Author:
Hofmann Katja1,
Whiteson Shimon1,
Rijke Maarten De1
Affiliation:
1. University of Amsterdam
Abstract
Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of
interleaved comparison
methods, which compare rankers using click data obtained when interleaving their rankings.
In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has
fidelity
if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is
sound
if its estimates of that expected outcome are unbiased and consistent. It is
efficient
if those estimates are accurate with only little data.
We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a
probabilistic interleave
method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.
Funder
European Union's ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme
Center for Creation, Content and Technology
CLARIN-nl program
CIP ICT-PSP
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
European Social Fund
Dutch national program COMMIT
Seventh Framework Programme
Royal Netherlands Academy of Arts and Sciences
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference49 articles.
1. Improving web search ranking by incorporating user behavior information
2. Carterette B. and Jones R. 2008. Evaluating search engines by modeling the relationship between relevance and clicks. In Advances in Neural Information Processing Systems 20 (NIPS’07). J. Platt D. Koller Y. Singer and S. Roweis Eds. MIT Press Cambridge MA 217--224. Carterette B. and Jones R. 2008. Evaluating search engines by modeling the relationship between relevance and clicks. In Advances in Neural Information Processing Systems 20 (NIPS’07) . J. Platt D. Koller Y. Singer and S. Roweis Eds. MIT Press Cambridge MA 217--224.
3. A dynamic bayesian network click model for web search ranking
4. Expected reciprocal rank for graded relevance
5. Large-scale validation and analysis of interleaved search evaluation
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Validating Synthetic Usage Data in Living Lab Environments;Journal of Data and Information Quality;2023-09-24
2. Interleaved Online Testing in Large-Scale Systems;Companion Proceedings of the ACM Web Conference 2023;2023-04-30
3. Theoretical Analysis on the Efficiency of Interleaved Comparisons;Lecture Notes in Computer Science;2023
4. Debiased Balanced Interleaving at Amazon Search;Proceedings of the 31st ACM International Conference on Information & Knowledge Management;2022-10-17
5. MergeDTS;ACM Transactions on Information Systems;2020-10-31