A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM-Reference-Cited by-同舟云学术

A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

Published:2023-03-24 Issue:7 Volume:12 Page:1531
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Pan Li¹^ORCID,Lim Wei Hong²^ORCID,Gan Yong¹

Affiliation:

1. Zhengzhou Institute of Engineering and Technology, Zhenzhou 450044, China

2. Faculty of Engineering, Technology and Built Environment, UCSI University, Cheras, Kuala Lumpur 56000, Malaysia

Abstract

Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, F1 value, Ma_F and Mi_F are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.

Funder

2021 Key Scientific Research Project of colleges and universities in Henan Province

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/7/1531/pdf

Reference38 articles.

1. Hierarchical LSTM network for TC;Borna;SN Appl. Sci.,2019

2. Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding;Ji;Data Intell.,2019

3. A review of recurrent neural networks: LSTM cells and network architectures;Yu;Neural Comput.,2019

4. Liu, Z., Kan, H., Zhang, T., and Li, Y. (2020). DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for S-TC. Appl. Sci., 10.

5. Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018, Springer.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hybrid Approach for Multi-Classification of News Documents Using Artificial Intelligence;2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV);2024-03-11

2. An Automatic Sentiment Analysis Method for Short Texts Based on Transformer-BERT Hybrid Model;IEEE Access;2024

3. Automatic literature screening using the PAJO deep-learning model for clinical practice guidelines;BMC Medical Informatics and Decision Making;2023-11-03