Pivoted Document Length Normalization-Reference-Cited by-同舟云学术

Pivoted Document Length Normalization

Published:2017-08-02 Issue:2 Volume:51 Page:176-184
ISSN:0163-5840
Container-title:ACM SIGIR Forum
language:en
Short-container-title:SIGIR Forum

Author:

Singhal Amit¹,Buckley Chris¹,Mitra Manclar¹

Affiliation:

1. Cornell University, Ithaca, NY

Abstract

Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrievaf probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collectzorz independent normalization technique. We use the idea of pivoting with the well known cosine normalization function. We point out some shortcomings of the cosine function andpresent two new normalization functions--pivoted unique normalization and piuotert byte size normalization.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3130348.3130365

Reference12 articles.

1. The importance of proper weighting methods

2. Overview of the third text Retrieval conference (TREC-3)

3. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval

Cited by 28 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of design education documents and the disconnect between designer priorities, tools, and occupant assumptions;Architectural Science Review;2024-08-19

2. FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction;Proceedings of the International Conference on Advances in Social Networks Analysis and Mining;2023-11-06

3. Identification of potential artificial groundwater recharge sites using GIS and the analytical hierarchy process: case study of Tamellalt plain, Morocco;Hydrogeology Journal;2023-09-09

4. Political ideology of nonprofit organizations;Social Science Quarterly;2023-09-04

5. Low-Latency Dimensional Expansion and Anomaly Detection Empowered Secure IoT Network;IEEE Transactions on Network and Service Management;2023-09