Affiliation:
1. School of Communication, Soochow University, Suzhou 215123, China
Abstract
The advancement in technology is taking place with an accelerating pace across the globe. With the increasing expansion and technological advancement, a vast volume of text data are generated everyday, in the form of social media platform, websites, company data, healthcare data, and news. Indeed, it is a difficult task to extract intriguing patterns from the text data, such as opinions, summaries, and facts, having varying length. Because of the problems of the length of text data and the difficulty of feature value extraction in news, this paper proposes a news text classification method based on the combination of deep learning (DL) algorithms. In order to classify the text data, the earlier approaches use a single word vector to express text information and only the information of the relationship between words were considered, but the relationship between words and categories was ignored which indeed is an important factor for the classification of news text. This paper follows the idea of a customized algorithm which is the combination of DL algorithms such as CNN, LSTM, and MLP and proposes a customized DCLSTM-MLP model for the classification of news text data. The proposed model is expressed in parallel with word vector and word dispersion. The relationship among words is represented by the word vector as an input of the CNN module, and the relationship between words and categories is represented by a discrete vector as an input of the MLP module in order to realize comprehensive learning of spatial feature information, time-series feature information, and relationship between words and categories of news text. To check the stability and performance of the proposed method, multiple experiments were performed. The experimental results showed that the proposed method solves the problems of text length, difficulty of feature extraction in the news text, and classification of news text in an effective way and attained better accuracy, recall rate, and comprehensive value as compared to the other models.
Subject
Computer Science Applications,Software
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献