Poisson mixtures-Reference-Cited by-同舟云学术

Poisson mixtures

Published:1995-06 Issue:2 Volume:1 Page:163-190
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Church Kenneth W.,Gale William A.

Abstract

AbstractShannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference16 articles.

1. Probabilistic models for automatic indexing

Cited by 161 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Within arms reach: Physical proximity shapes mother-infant language exchanges in real-time;Developmental Cognitive Neuroscience;2023-12

2. Sentiment Analysis Using Smoothed Probabilistic-Based Models;2023 9th International Conference on Control, Decision and Information Technologies (CoDIT);2023-07-03

3. Stop Words for Processing Software Engineering Documents: Do they Matter?;2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE);2023-05

4. Evaluation of keyness metrics: performance and reliability;Corpus Linguistics and Linguistic Theory;2023-04-27

5. A Query-Based Weighted Document Partitioning Method for Load Balancing in Search Engines;Wireless Personal Communications;2023-03-20