Probabilistic Topic Modeling for Comparative Analysis of Document Collections-Reference-Cited by-同舟云学术

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

Published:2020-03-07 Issue:2 Volume:14 Page:1-27
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Hua Ting¹,Lu Chang-Tien¹,Choo Jaegul²,Reddy Chandan K.¹

Affiliation:

1. Virginia Tech

2. Korea University, South Korea

Abstract

Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposed model can be useful for “comparative thinking” analysis in real-world document collections.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3369873

Reference54 articles.

1. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm

2. Jordan Boyd-Graber Yuening Hu David Mimno etal 2017. Applications of topic models. Foundations and Trends® in Information Retrieval 11 2--3 60--62. Jordan Boyd-Graber Yuening Hu David Mimno et al. 2017. Applications of topic models. Foundations and Trends® in Information Retrieval 11 2--3 60--62.

3. Non-negative Matrix Factorization on Manifold

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Parallel inference for cross-collection latent generalized Dirichlet allocation model and applications;Expert Systems with Applications;2024-03

2. A survey of topic models: From a whole-cycle perspective;Journal of Intelligent & Fuzzy Systems;2023-12-02

3. Cross-collection latent Beta-Liouville allocation model training with privacy protection and applications;Applied Intelligence;2023-01-13

4. Question Tags or Text for Topic Modeling: Which is better;Procedia Computer Science;2023

5. A Selective Supervised Latent Beta-Liouville Allocation for Document Classification;Advances and Trends in Artificial Intelligence. Theory and Applications;2023