Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information-Reference-Cited by-同舟云学术

Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information

Published:2023-02-08 Issue:4 Volume:13 Page:2167
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Jo A-Hyeon¹,Kwak Keun-Chang²^ORCID

Affiliation:

1. Electronic Engineering IT-Bio Convergence System Major, Chosun University, Gwangju 61452, Republic of Korea

2. Electronic Engineering, Chosun University, Gwangju 61452, Republic of Korea

Abstract

Identifying a person’s emotions is an important element in communication. In particular, voice is a means of communication for easily and naturally expressing emotions. Speech emotion recognition technology is a crucial component of human–computer interaction (HCI), in which accurately identifying emotions is key. Therefore, this study presents a two-stream-based emotion recognition model based on bidirectional long short-term memory (Bi-LSTM) and convolutional neural networks (CNNs) using a Korean speech emotion database, and the performance is comparatively analyzed. The data used in the experiment were obtained from the Korean speech emotion recognition database built by Chosun University. Two deep learning models, Bi-LSTM and YAMNet, which is a CNN-based transfer learning model, were connected in a two-stream architecture to design an emotion recognition model. Various speech feature extraction methods and deep learning models were compared in terms of performance. Consequently, the speech emotion recognition performance of Bi-LSTM and YAMNet was 90.38% and 94.91%, respectively. However, the performance of the two-stream model was 96%, which was a minimum of 1.09% and up to 5.62% improved compared with a single model.

Funder

the National IT Industry Promotion Agency of Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/4/2167/pdf

Reference20 articles.

1. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Speech Commun.,2020

2. Speech Emotion Recognition algorithm based on deep learning algorithm fusion of temporal and spatial features;An;J. Phys. Conf. Ser.,2021

3. Kipyatkova, I. (2019, January 20–25). LSTM-Based Language Models for Very Large Vocabulary Continuous Russian Speech Recognition System. Proceedings of the SPECOM 2019: Speech and Computer, Istanbul, Turkey.

4. Basu, S., Chakraborty, J., and Aftabuddin, M. (2017, January 19–20). Emotion recognition from speech using convolutional neural network with recurrent neural network architecture. Proceedings of the 2017 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India.

5. Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends;Peng;IEEE Access,2020

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic review of trimodal affective computing approaches: Text, audio, and visual integration in emotion recognition and sentiment analysis;Expert Systems with Applications;2024-12

2. Multi-Label Emotion Recognition of Korean Speech Data Using Deep Fusion Models;Applied Sciences;2024-08-28

3. A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis;Applied Sciences;2024-08-15

4. Cough Detection Using Acceleration Signals and Deep Learning Techniques;Electronics;2024-06-20

5. Real Time Spatial Sound Scene Analysis-AlertNet;2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI);2024-05-09