Information Retrieval as Statistical Translation-Reference-Cited by-同舟云学术

Information Retrieval as Statistical Translation

Published:2017-08-02 Issue:2 Volume:51 Page:219-226
ISSN:0163-5840
Container-title:ACM SIGIR Forum
language:en
Short-container-title:SIGIR Forum

Author:

Berger Adam¹,Lafferty John¹

Affiliation:

1. Carnegie Mellon University, Pittsburgh, PA

Abstract

We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a statistical model of how a user might distill or "translate" a given document into a query. To assess the relevance of a document to a user's query, we estimate the probability that the query would have been generated as a translation of the document, and factor in the user's general preferences in the form of a prior distribution over documents. We propose a simple, well motivated model of the document-to-query translation process, and describe an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents. As we show, one can view this approach as a generalization and justification of the "language modeling" strategy recently proposed by Ponte and Croft. In a series of experiments on TREC data, a simple translation-based retrieval system performs well in comparison to conventional retrieval techniques. This prototype system only begins to tap the full potential of translation-based retrieval.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3130348.3130371

Reference15 articles.

1. Probabilistic models for automatic indexing

2. A. Broder and M. Henzinger (1998). "Information retrieval on the web: Tools and algorithmic issues " Invited tutorial at Foundations of Computer Sci- ence (FOCS). A. Broder and M. Henzinger (1998). "Information retrieval on the web: Tools and algorithmic issues " Invited tutorial at Foundations of Computer Sci- ence (FOCS).

3. The mathematics of statistical machine translation: Parameter estimation;Brown P.;Computational Linguistics,1993

4. P. Brown S. Della Pietra V. Della Pietra M. Goldsmith J. Hajic R. Mercer and S. Mohanty (1993). "But dictionaries are data too " In Proceedings of the ARPA Human Language Technology Workshop Plainsborough New Jersey. 10.3115/1075671.1075716 P. Brown S. Della Pietra V. Della Pietra M. Goldsmith J. Hajic R. Mercer and S. Mohanty (1993). "But dictionaries are data too " In Proceedings of the ARPA Human Language Technology Workshop Plainsborough New Jersey. 10.3115/1075671.1075716

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Flooddamagecast: Building Flood Damage Nowcasting with Machine Learning and Data Augmentation;2024

2. Esdnn: Efficient Smoothing-Based Deep Neural Network for Text Information Retrieval;2024

3. MCA-NER: Multi-Contextualized Adversarial-Based Attentional Deep Neural Network for Named Entity Recognition;International Journal of Pattern Recognition and Artificial Intelligence;2023-09-21

4. Deep neural ranking model using distributed smoothing;Expert Systems with Applications;2023-08

5. An Improved Sentence Embeddings based Information Retrieval Technique using Query Reformulation;2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT);2023-05-05