Analysis and Comparison of Chinese News Text Classification Methods Based on Deep Learning-Reference-Cited by-同舟云学术

Analysis and Comparison of Chinese News Text Classification Methods Based on Deep Learning

Published:2022-11-10 Issue: Volume:16 Page:146-154
ISSN:2791-0210
Container-title:Highlights in Science, Engineering and Technology
language:
Short-container-title:HSET

Author:

Chen Jian,Feng Zekai,Jiang Wenxiao

Abstract

As people in today's world consume an increasing amount of information, the number of Internet News is also vastly increasing. Facing all sorts of different kinds of news, how to accurately distinguished different types of news becomes the direction of many scholars' study.This article uses word cloud to represent keywords used in different domains of news. Moreover, we used two methods: TF-IDF and TextRank, to identify and analyze keywords of different fields of news. To understand the performance using various classification methods, we choose the THUCNews data sets. This data set collects ten fields of news in the history of Weibo. Moreover, we choose nine different kinds of machine learning algorithms, including SVM, XGBoost, RandomForest, GBDT, GRU, LSTM, CNN, RNN, and MLP, to investigate their performance. Among these nine models, GRU has an accuracy of 96.93%, SVM has an accuracy of 96.39%, CNN has an accuracy of 94.72%, and RandomForest has an accuracy of 92.97%, which make them stand out in their similar models. We used word-embedding vectorization for the Neural Network algorithm and TF-IDF vectorization for the others.

Publisher

Darcy & Roy Press Co. Ltd.

Reference14 articles.

1. Ogura H , Amano H , Kondo M . Comparison of metrics for feature selection in imbalanced text classification [J]. Expert Systems with Applications, 2011, 38(5):4978-4989.

2. Chen K , Zhang Z , Long J , et al. Turning from TF-IDF to TF-IGM for term weighting in text classification[J]. Expert Systems with Applications an International Journal, 2016, 66(Dec.):245-260.

3. Chen S . K-Nearest Neighbor Algorithm Optimization in Text Categorization[J]. IOP Conference Series Earth and Environmental Science, 2018, 108(5):052074.

4. Selvi S T , Karthikeyan P , Vincent A , et al. Text categorization using Rocchio algorithm and random forest algorithm[C]// 2016 Eighth International Conference on Advanced Computing (ICoAC). IEEE, 2017.

5. Seyyedi, Seyyed, Hossein, et al. Enhancing Effectiveness of Dimension Reduction in Text Classification[J]. International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms, 2017.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text Stream Classification: Literature Review and Current Trends;2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13