Affiliation:
1. School of Economics and Management, Beijing Information Science and Technology University, Beijing 100192, China
2. Beijing Key Lab of Green Development Decision Based on Big Data, Beijing 100192, China
Abstract
User comments often contain their most practical requirements. Using topic modeling of user comments, it is possible to classify and downscale text data, mine the information in user comments, and understand users’ requirements and preferences. However, user comment texts are usually short and lack rich word frequency and contextual information with sparsity. The traditional topic model cannot model and analyze these short texts well. The biterm topic model (BTM), while solving the sparsity problem, suffers from accuracy and noise problems. In order to eliminate information barriers and further ensure information symmetry, a new topic clustering model, termed the word-embedding similarity-based BTM (WES-BTM), is proposed in this paper. The WES-BTM builds on the BTM by converting word pairs into word vectors and calculating their similarity to perform word pair filtering, which in turn improves clustering accuracy. Based on the experimental results using actual data, the WES-BTM outperforms the BTM, LDA, and NMF models in terms of topic coherence, perplexity, and Jensen–Shannon divergence. It is verified that the WES-BTM can effectively reduce noise and improve the quality of topic clustering. In this way, the information in user comments can be better mined.
Funder
National Key Research and Development Program of China
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Reference36 articles.
1. Lin, T., Tian, W., Mei, Q., and Cheng, H. (2014, January 7). The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text. Proceedings of the 23rd International Conference on World Wide Web, New York, NY, USA.
2. Probabilistic Topic Modeling in Multilingual Settings: An Overview of Its Methodology and Applications;Tang;Inf. Process. Manag.,2015
3. Latent Dirichlet Allocation;Blei;J. Mach. Learn. Res.,2003
4. Yan, X., Guo, J., Lan, Y., and Cheng, X. (2013, January 13). A Biterm Topic Model for Short Texts. Proceedings of the 22nd International Conference on World Wide Web, ACM, Rio de Janeiro, Brazil.
5. Dehak, N., Dehak, R., Glass, J., Reynolds, D., and Kenny, P. (2022, December 01). Cosine Similarity Scoring without Score Normalization Techniques. Available online: http://groups.csail.mit.edu/sls/publications/2010/Dehak_Odyssey.pdf.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献