Abstract
This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well.
Subject
General Physics and Astronomy
Reference33 articles.
1. How Many Topics? Stability Analysis for Topic Models. Joint European Conference on Machine Learning and Knowledge Discovery in Databases;Greene,2014
2. Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
3. Latent dirichlet allocation;Blei;J. Mach. Learn. Res.,2003
4. Renormalization Analysis of Topic Models
5. Text classification method based on self-training and LDA topic models
Cited by
52 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献