Weighting Passages Enhances Accuracy-Reference-Cited by-同舟云学术

Weighting Passages Enhances Accuracy

Published:2021-04-30 Issue:2 Volume:39 Page:1-11
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Muntean Cristina Ioana¹^ORCID,Nardini Franco Maria¹,Perego Raffaele¹,Tonellotto Nicola²^ORCID,Frieder Ophir³

Affiliation:

1. ISTI-CNR, Italy

2. University of Pisa, Italy

3. Georgetown University, U.S.A.

Abstract

We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.

Funder

European Commission

Ministero dellðIstruzione, dellðUniversità e della Ricerca

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3428687

Reference54 articles.

1. Frequentist and Bayesian Approach to Information Retrieval

2. Investigation of partial query proximity in web search

3. Utilizing passage-based language models for ad hoc document retrieval

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Building a Multilevel Inflection Handling Stemmer to Improve Search Effectiveness for Urdu Language;IEEE Access;2024

2. Extractive Explanations for Interpretable Text Ranking;ACM Transactions on Information Systems;2023-03-23

3. The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval;ACM Transactions on Information Systems;2023-02-07