A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval-Reference-Cited by-同舟云学术

A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval

Published:2022-07-31 Issue:3 Volume:40 Page:1-37
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Dang Edward Kai Fung¹^ORCID,Luk Robert Wing Pong¹,Allan James²

Affiliation:

1. The Hong Kong Polytechnic University, Hung Hom, Hong Kong

2. University of Massachusetts, Amherst, MA

Abstract

In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.

Funder

HK PolyU

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3483612

Reference94 articles.

1. Probabilistic models of information retrieval based on measuring the divergence from randomness;Amati G.;ACM Transactions on Information Systems,2002

2. Improvements that don't add up

3. Controlling the false discovery rate: A practical and powerful approach to multiple testing;Benjamini Y.;Journal of the Royal Statistical Society. Series B (Methodological),1995