Author:
Zhang Lei,Chen Hai Qiang,Li Wei Jie,Liu Yan Zhao,Wu Run Pu
Abstract
Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.
Publisher
Trans Tech Publications, Ltd.
Reference1 articles.
1. Jure Leskovec, John Shawe-Taylor. Semantic Text Features from Small World Graphs. Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimization perspectives Workshop . (2005).
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献