Affiliation:
1. Jiangsu Key Laboratory of Big Data Security & Intelligent Processing School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, P. R. China
Abstract
Understanding contents in social networks by inferring high-quality latent topics from short texts is a significant task in social analysis, which is challenging because social network contents are usually extremely short, noisy and full of informal vocabularies. Due to the lack of sufficient word co-occurrence instances, well-known topic modeling methods such as LDA and LSA cannot uncover high-quality topic structures. Existing research works seek to pool short texts from social networks into pseudo documents or utilize the explicit relations among these short texts such as hashtags in tweets to make classic topic modeling methods work. In this paper, we explore this problem by proposing a topic model for noisy short texts with multiple relations called MRTM (Multiple Relational Topic Modeling). MRTM exploits both explicit and implicit relations by introducing a document-attribute distribution and a two-step random sampling strategy. Extensive experiments, compared with the state-of-the-art topic modeling approaches, demonstrate that MRTM can alleviate the word co-occurrence sparsity and uncover high-quality latent topics from noisy short texts.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献