A Hybrid Deep Learning Emotion Classification System Using Multimodal Data-Reference-Cited by-同舟云学术

A Hybrid Deep Learning Emotion Classification System Using Multimodal Data

Published:2023-11-22 Issue:23 Volume:23 Page:9333
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Kim Dong-Hwi¹,Son Woo-Hyeok¹,Kwak Sung-Shin¹,Yun Tae-Hyeon¹,Park Ji-Hyeok¹^ORCID,Lee Jae-Dong¹

Affiliation:

1. Department of Computer Science, Dankook University, 152 Jukjeon-ro Campus, Suji-gu, Yongin-si 16890, Republic of Korea

Abstract

This paper proposes a hybrid deep learning emotion classification system (HDECS), a hybrid multimodal deep learning system designed for emotion classification in a specific national language. Emotion classification is important in diverse fields, including tailored corporate services, AI advancement, and more. Additionally, most sentiment classification techniques in speaking situations are based on a single modality: voice, conversational text, vital signs, etc. However, analyzing these data presents challenges because of the variations in vocal intonation, text structures, and the impact of external stimuli on physiological signals. Korean poses challenges in natural language processing, including subject omission and spacing issues. To overcome these challenges and enhance emotion classification performance, this paper presents a case study using Korean multimodal data. The case study model involves retraining two pretrained models, LSTM and CNN, until their predictions on the entire dataset reach an agreement rate exceeding 0.75. Predictions are used to generate emotional sentences appended to script data, which are further processed using BERT for final emotion prediction. The research result is evaluated by using categorical cross-entropy (CCE) to measure the difference between the model’s predictions and actual labels, F1 score, and accuracy. According to the evaluation, the case model outperforms the existing KLUE/roBERTa model with improvements of 0.5 in CCE, 0.09 in accuracy, and 0.11 in F1 score. As a result, the HDECS is expected to perform well not only on Korean multimodal datasets but also on sentiment classification considering the speech characteristics of various languages and regions.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/23/9333/pdf

Reference43 articles.

1. Emotion recognition and affective computing on vocal social media;Dai;Inf. Manag.,2015

2. A survey of state-of-the-art approaches for emotion recognition in text;Alswaidan;Knowl. Inf. Syst.,2020

3. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv.

4. Long short-term memory;Hochreiter;Neural Comput.,1997

5. Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3–6). Imagenet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA.