Repeatable evaluation of search services in dynamic environments-Reference-Cited by-同舟云学术

Repeatable evaluation of search services in dynamic environments

Published:2007-11 Issue:1 Volume:26 Page:1
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Jensen Eric C.¹,Beitzel Steven M.²,Chowdhury Abdur¹,Frieder Ophir³

Affiliation:

1. Summize, Inc.

2. Illinois Institute of Technology

3. Illinois Institute of Technology and Georgetown University

Abstract

In dynamic environments, such as the World Wide Web, a changing document collection, query population, and set of search services demands frequent repetition of search effectiveness (relevance) evaluations. Reconstructing static test collections, such as in TREC, requires considerable human effort, as large collection sizes demand judgments deep into retrieved pools. In practice it is common to perform shallow evaluations over small numbers of live engines (often pairwise, engine A vs. engine B) without system pooling. Although these evaluations are not intended to construct reusable test collections, their utility depends on conclusions generalizing to the query population as a whole. We leverage the bootstrap estimate of the reproducibility probability of hypothesis tests in determining the query sample sizes required to ensure this, finding they are much larger than those required for static collections. We propose a semiautomatic evaluation framework to reduce this effort. We validate this framework against a manual evaluation of the top ten results of ten Web search engines across 896 queries in navigational and informational tasks. Augmenting manual judgments with pseudo-relevance judgments mined from Web taxonomies reduces both the chances of missing a correct pairwise conclusion, and those of finding an errant conclusion, by approximately 50%.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1292591.1292592

Reference64 articles.

1. Peer review of statistics in medical research: The other problem;Bacchetti P.;Brit. Med. J.,2002

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluation of Temporal Change in IR Test Collections;Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval;2024-08-02

2. Replicability Measures for Longitudinal Information Retrieval Evaluation;Lecture Notes in Computer Science;2024

3. Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems;Lecture Notes in Computer Science;2021

4. Effectiveness Involving Multiple Queries;Encyclopedia of Database Systems;2018

5. Effectiveness Involving Multiple Queries;Encyclopedia of Database Systems;2017