Vietnamese Text Classification Algorithm using Long Short Term Memory and Word2Vec

Author:

Phat Huu Nguyen,Anh Nguyen Thi Minh

Abstract

In the context of the ongoing forth industrial revolution and fast computer science development the amount of textual information becomes huge. So, prior to applying the seemingly appropriate methodologies and techniques to the above data processing their nature and characteristics should be thoroughly analyzed and understood. At that, automatic text processing incorporated in the existing systems may facilitate many procedures. So far, text classification is one of the basic applications to natural language processing accounting for such factors as emotions’ analysis, subject labeling etc. In particular, the existing advancements in deep learning networks demonstrate that the proposed methods may fit the documents’ classifying, since they possess certain extra efficiency; for instance, they appeared to be effective for classifying texts in English. The thorough study revealed that practically no research effort was put into an expertise of the documents in Vietnamese language. In the scope of our study, there is not much research for documents in Vietnamese. The development of deep learning models for document classification has demonstrated certain improvements for texts in Vietnamese. Therefore, the use of long short term memory network with Word2vec is proposed to classify text that improves both performance and accuracy. The here developed approach when compared with other traditional methods demonstrated somewhat better results at classifying texts in Vietnamese language. The evaluation made over datasets in Vietnamese shows an accuracy of over 90%; also the proposed approach looks quite promising for real applications.

Publisher

SPIIRAS

Subject

Artificial Intelligence,Applied Mathematics,Computational Theory and Mathematics,Computational Mathematics,Computer Networks and Communications,Information Systems

Reference47 articles.

1. Hochreiter S., Schmidhuber J. Long short-term memory // Neural computation. 1997. vol. 9. pp. 1735–1780.

2. Sak H., Senior A., Beaufays F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition // arXiv preprint arXiv:1402.1128.2014.

3. Phuong L.-H., Nguyen H., Roussanaly A., Ho T. A hybrid approach to word segmentation of vietnamese texts // Lecture Notes in Computer Science. 2013. vol. 5196. pp. 240–249.

4. Hoang V.C.D., Dinh D., Nguyen N. le, Ngo H.Q. A comparative study on Vietnamese text classification methods // 2007 IEEE International Conference on Research, Innovation and Vision for the Future. 2007. pp. 267–273.

5. Ngo Q.H., Dien D., Winiwarter W. A hybrid method for word segmentation with english- vietnamese bilingual text // 2013 International Conference on Control, Automation and Information Sciences (ICCAIS). 2013. pp. 48–52.

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A study on deep learning for Vietnamese text classification;Journal of Military Science and Technology;2024-05-20

2. Impact of word embedding models on text analytics in deep learning environment: a review;Artificial Intelligence Review;2023-02-22

3. User-Item Correlation in Hybrid Neighborhood-Based Recommendation System with Synthetic User Data;2022 IEEE Ninth International Conference on Communications and Electronics (ICCE);2022-07-27

4. Experimental Study of Language Models of "Transformer" in the Problem of Finding the Answer to a Question in a Russian-Language Text;Informatics and Automation;2022-05-06

5. Proposing Recommendation System Using Bag of Word and Multi-label Support Vector Machine Classification;Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications;2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3