Evaluating Evaluation Measure Stability-Reference-Cited by-同舟云学术

Evaluating Evaluation Measure Stability

Published:2017-08-02 Issue:2 Volume:51 Page:235-242
ISSN:0163-5840
Container-title:ACM SIGIR Forum
language:en
Short-container-title:SIGIR Forum

Author:

Buckley Chris¹,Voorhees Ellen M.²

Affiliation:

1. Sabir Research Inc., Gaithersburg, MD

2. National Institute of Standards and Technology, Gaithersburg, Maryland

Abstract

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very large difference in evaluation scores before concluding that the two methods are actually different.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3130348.3130373

Reference28 articles.

1. James Allan Jamie Callan Fang-Fang Feng and Daniella Malin. INQUERY and TREC-8. In Voorhees and Harman [26]. James Allan Jamie Callan Fang-Fang Feng and Daniella Malin. INQUERY and TREC-8. In Voorhees and Harman [26].

2. Chris Buckley and Janet Walz. SMART in TREC 8. In Voorhees and Harman [26]. Chris Buckley and Janet Walz. SMART in TREC 8. In Voorhees and Harman [26].

3. Efficient construction of large test collections

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing Students’ Personality Traits: A Study of Virtual Reality-Based Educational Practices;Electronics;2024-08-23

2. A set of novel HTML document quality features for Web information retrieval: Including applications to learning to rank for information retrieval;Expert Systems with Applications;2024-07

3. The shortcomings of equal weights estimation and the composite equivalence index in PLS-SEM;European Journal of Marketing;2024-02-08

4. Predictive Reranking using Code Smells for Information Retrieval Fault Localization;2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI);2024-01-25

5. An Intrinsic Framework of Information Retrieval Evaluation Measures;Lecture Notes in Networks and Systems;2024