Research on automatic classification of Chinese papers based on LDA model and TF-IDF algorithm-Reference-Cited by-同舟云学术

Research on automatic classification of Chinese papers based on LDA model and TF-IDF algorithm

Published:2023-02-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Qizheng Li¹,Weilin Hu¹,Hao Dai¹

Affiliation:

1. Zhejiang Sci-Tech University

Abstract

Abstract To ensure the automatic classification of research directions in designated fields along with the natural semantic machine recognition of research papers, this research proposes an automatic classification method based on topic probability. Initially, the feature words and their topic probabilities are calculated through the LDA topic model, and the weight of feature words in each paper is calculated by means of the TF-IDF algorithm. Finally, the topic probabilities of each feature word in the paper are weighted and normalized, thus accomplishing the topic probability distribution from words to papers. Through unsupervised classification experiments on 53,780 papers in the field of clothing, the accuracy rate reaches 92.4%, and the F-score reaches 85.0% in comparison with professional manual classification. Consequently, the probability classification of paper topics proposed in this research can be directly used to solve the automatic classification of papers and the automatic classification of research directions.

Publisher

Research Square Platform LLC

Reference18 articles.

1. Research paper classification systems based on TF-IDF and LDA schemes;Kim SW;Human-centric Computing and Information Sciences,2019

2. Thushara, M. G., Krishnapriya, M. S., & Nair, S. S. (2018). Domain Classification of Research Papers Using Hybrid Keyphrase Extraction Method. In Recent Findings in Intelligent Computing Techniques (pp. 387–398). Springer, Singapore.https://doi.org/10.1007/978-981-10-8636-6_40

3. Research on automatic classification method of papers based on multi-view fusion;Yang X;Modern Electronic Technology,2020

4. Research on text classification based on automatically extracted keywords;Ni P;International Journal of Enterprise Information Systems (IJEIS),2020

5. Efficient estimation of word representations in vector space;Mikolov T;arXiv preprint arXiv,2013

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LogTraceAD: Anomaly Detection from Both Logs and Traces with Graph Representation Learning;Proceedings of the 2023 2nd International Conference on Networks, Communications and Information Technology;2023-06-16