Affiliation:
1. University of Duisburg-Essen, Germany
Abstract
This paper points out some mistakes that can be frequently found in IR publications: MRR and ERR violate basic requirements for a metric, MAP is based on unrealistic assumptions, the numbers shown overstate the precision of the result, relative improvements of arithmetic means are inappropriate, the simple holdout method yields unreliable results, hypotheses are often formulated after the experiment, significance tests frequently ignore the multiple comparisons problem, effect sizes are ignored, reproducibility of the experiments might be nearly impossible, and sometimes authors claim proof by experimentation.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Management Information Systems
Cited by
83 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Predicting Representations of Information Needs from Digital Activity Context;ACM Transactions on Information Systems;2024-01-15
2. An Intrinsic Framework of Information Retrieval Evaluation Measures;Lecture Notes in Networks and Systems;2024
3. Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences;ACM Transactions on Information Systems;2023-12-30
4. Perspectives on Large Language Models for Relevance Judgment;Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval;2023-08-09
5. MMEAD: MS MARCO Entity Annotations and Disambiguations;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18