A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

Author:

Pan Li1ORCID,Lim Wei Hong2ORCID,Gan Yong1

Affiliation:

1. Zhengzhou Institute of Engineering and Technology, Zhenzhou 450044, China

2. Faculty of Engineering, Technology and Built Environment, UCSI University, Cheras, Kuala Lumpur 56000, Malaysia

Abstract

Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, F1 value, Ma_F and Mi_F are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.

Funder

2021 Key Scientific Research Project of colleges and universities in Henan Province

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference38 articles.

1. Hierarchical LSTM network for TC;Borna;SN Appl. Sci.,2019

2. Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding;Ji;Data Intell.,2019

3. A review of recurrent neural networks: LSTM cells and network architectures;Yu;Neural Comput.,2019

4. Liu, Z., Kan, H., Zhang, T., and Li, Y. (2020). DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for S-TC. Appl. Sci., 10.

5. Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018, Springer.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Hybrid Approach for Multi-Classification of News Documents Using Artificial Intelligence;2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV);2024-03-11

2. An Automatic Sentiment Analysis Method for Short Texts Based on Transformer-BERT Hybrid Model;IEEE Access;2024

3. Automatic literature screening using the PAJO deep-learning model for clinical practice guidelines;BMC Medical Informatics and Decision Making;2023-11-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3