Author:
Chen Jian,Feng Zekai,Jiang Wenxiao
Abstract
As people in today's world consume an increasing amount of information, the number of Internet News is also vastly increasing. Facing all sorts of different kinds of news, how to accurately distinguished different types of news becomes the direction of many scholars' study.This article uses word cloud to represent keywords used in different domains of news. Moreover, we used two methods: TF-IDF and TextRank, to identify and analyze keywords of different fields of news. To understand the performance using various classification methods, we choose the THUCNews data sets. This data set collects ten fields of news in the history of Weibo. Moreover, we choose nine different kinds of machine learning algorithms, including SVM, XGBoost, RandomForest, GBDT, GRU, LSTM, CNN, RNN, and MLP, to investigate their performance. Among these nine models, GRU has an accuracy of 96.93%, SVM has an accuracy of 96.39%, CNN has an accuracy of 94.72%, and RandomForest has an accuracy of 92.97%, which make them stand out in their similar models. We used word-embedding vectorization for the Neural Network algorithm and TF-IDF vectorization for the others.
Publisher
Darcy & Roy Press Co. Ltd.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Text Stream Classification: Literature Review and Current Trends;2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13