Probabilistic models of information retrieval based on measuring the divergence from randomness-Reference-Cited by-同舟云学术

Probabilistic models of information retrieval based on measuring the divergence from randomness

Published:2002-10 Issue:4 Volume:20 Page:357-389
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Amati Gianni¹,Van Rijsbergen Cornelis Joost²

Affiliation:

1. University of Glasgow, Fondazione Ugo Bordoni, Roma, Italy

2. University of Glasgow, Glasgow, Scotland

Abstract

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/582415.582416

Reference38 articles.

1. Probabilistic models for automatic indexing;Bookstein A.;J. Am. Soc. Inf. Sci.,1974

2. Foundations of Probabilistic and Utility-Theoretic Indexing

Cited by 477 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dimension Importance Estimation for Dense Information Retrieval;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10

2. ESG performance and financial distress prediction of energy enterprises;Finance Research Letters;2024-07

3. A novel redistribution-based feature selection for text classification;Expert Systems with Applications;2024-07

4. A case study on decompounding in Indian language IR;Natural Language Processing;2024-06-03

5. Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document reranking;Computational Intelligence;2024-06