Abstract
Comparing two sets of documents to identify new topics is useful in many applications, like discovering trending topics from sets of scientific papers, emerging topic detection in microblogs, and interpreting sentiment variations in Twitter. In this paper, the main topic-modeling-based approaches to address this task are examined to identify limitations and necessary enhancements. To overcome these limitations, we introduce two separate frameworks to discover emerging topics through a filtered latent Dirichlet allocation (filtered-LDA) model. The model acts as a filter that identifies old topics from a timestamped set of documents, removes all documents that focus on old topics, and keeps documents that discuss new topics. Filtered-LDA also genuinely reduces the chance of using keywords from old topics to represent emerging topics. The final stage of the filter uses multiple topic visualization formats to improve human interpretability of the filtered topics, and it presents the most-representative document for each topic.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献