Vietnamese Text Classification Algorithm using Long Short Term Memory and Word2Vec-Reference-Cited by-同舟云学术

Vietnamese Text Classification Algorithm using Long Short Term Memory and Word2Vec

Published:2020-12-11 Issue:6 Volume:19 Page:1255-1279
ISSN:2713-3206
Container-title:Informatics and Automation
language:
Short-container-title:IA

Author:

Phat Huu Nguyen,Anh Nguyen Thi Minh

Abstract

In the context of the ongoing forth industrial revolution and fast computer science development the amount of textual information becomes huge. So, prior to applying the seemingly appropriate methodologies and techniques to the above data processing their nature and characteristics should be thoroughly analyzed and understood. At that, automatic text processing incorporated in the existing systems may facilitate many procedures. So far, text classiﬁcation is one of the basic applications to natural language processing accounting for such factors as emotions’ analysis, subject labeling etc. In particular, the existing advancements in deep learning networks demonstrate that the proposed methods may fit the documents’ classifying, since they possess certain extra efficiency; for instance, they appeared to be eﬀective for classifying texts in English. The thorough study revealed that practically no research effort was put into an expertise of the documents in Vietnamese language. In the scope of our study, there is not much research for documents in Vietnamese. The development of deep learning models for document classiﬁcation has demonstrated certain improvements for texts in Vietnamese. Therefore, the use of long short term memory network with Word2vec is proposed to classify text that improves both performance and accuracy. The here developed approach when compared with other traditional methods demonstrated somewhat better results at classifying texts in Vietnamese language. The evaluation made over datasets in Vietnamese shows an accuracy of over 90%; also the proposed approach looks quite promising for real applications.

Publisher

SPIIRAS

Subject

Artificial Intelligence,Applied Mathematics,Computational Theory and Mathematics,Computational Mathematics,Computer Networks and Communications,Information Systems

Reference47 articles.

1. Hochreiter S., Schmidhuber J. Long short-term memory // Neural computation. 1997. vol. 9. pp. 1735–1780.

2. Sak H., Senior A., Beaufays F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition // arXiv preprint arXiv:1402.1128.2014.

3. Phuong L.-H., Nguyen H., Roussanaly A., Ho T. A hybrid approach to word segmentation of vietnamese texts // Lecture Notes in Computer Science. 2013. vol. 5196. pp. 240–249.

4. Hoang V.C.D., Dinh D., Nguyen N. le, Ngo H.Q. A comparative study on Vietnamese text classiﬁcation methods // 2007 IEEE International Conference on Research, Innovation and Vision for the Future. 2007. pp. 267–273.

5. Ngo Q.H., Dien D., Winiwarter W. A hybrid method for word segmentation with english- vietnamese bilingual text // 2013 International Conference on Control, Automation and Information Sciences (ICCAIS). 2013. pp. 48–52.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A study on deep learning for Vietnamese text classification;Journal of Military Science and Technology;2024-05-20

2. Impact of word embedding models on text analytics in deep learning environment: a review;Artificial Intelligence Review;2023-02-22

3. User-Item Correlation in Hybrid Neighborhood-Based Recommendation System with Synthetic User Data;2022 IEEE Ninth International Conference on Communications and Electronics (ICCE);2022-07-27

4. Experimental Study of Language Models of "Transformer" in the Problem of Finding the Answer to a Question in a Russian-Language Text;Informatics and Automation;2022-05-06

5. Proposing Recommendation System Using Bag of Word and Multi-label Support Vector Machine Classification;Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications;2021