Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders-Reference-Cited by-同舟云学术

Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders

Published:2022-12-06 Issue:23 Volume:11 Page:4047
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Makhmudov Fazliddin^ORCID,Kutlimuratov Alpamis,Akhmedov Farkhod^ORCID,Abdallah Mohamed S.^ORCID,Cho Young-Im^ORCID

Abstract

Meticulous learning of human emotions through speech is an indispensable function of modern speech emotion recognition (SER) models. Consequently, deriving and interpreting various crucial speech features from raw speech data are complicated responsibilities in terms of modeling to improve performance. Therefore, in this study, we developed a novel SER model via attention-oriented parallel convolutional neural network (CNN) encoders that parallelly acquire important features that are used for emotion classification. Particularly, MFCC, paralinguistic, and speech spectrogram features were derived and encoded by designing different CNN architectures individually for the features, and the encoded features were fed to attention mechanisms for further representation, and then classified. Empirical veracity executed on EMO-DB and IEMOCAP open datasets, and the results showed that the proposed model is more efficient than the baseline models. Especially, weighted accuracy (WA) and unweighted accuracy (UA) of the proposed model were equal to 71.8% and 70.9% in EMO-DB dataset scenario, respectively. Moreover, WA and UA rates were 72.4% and 71.1% with the IEMOCAP dataset.

Funder

MSIT (Ministry of Science and ICT), Korea

Gachon University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/23/4047/pdf

Reference53 articles.

1. Zhang, Y., Du, J., Wang, Z., Zhang, J., and Tu, Y. (2018, January 12–15). Attention Based Fully Convolutional Network for Speech Emotion Recognition. Proceedings of the 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Honolulu, HI, USA.

2. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching;Zhang;IEEE Trans. Multimed.,2018

3. Speech emotion recognition based on feature selection and extreme learning machine decision tree;Liu;Neurocomputing,2018

4. Schuller, B., Rigoll, G., and Lang, M. (2003, January 6–9). Hidden Markov Model based speech emotion recognition. Proceedings of the International Conference on Multimedia & Expo, Baltimore, MD, USA.

5. New, T.L., Foo, S.W., and Silva, L.C.D. (2003, January 6–10). Classification of stress in speech using linear and nonlinear features. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 Proceedings (ICASSP ’03), Hong Kong, China.

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Newman-Watts-Strogatz topology in deep echo state networks for speech emotion recognition;Engineering Applications of Artificial Intelligence;2024-07

2. Enhancing Multimodal Emotion Recognition through Attention Mechanisms in BERT and CNN Architectures;Applied Sciences;2024-05-15

3. Privacy Preserving Federated Learning Approach for Speech Emotion Recognition;2023 26th International Conference on Computer and Information Technology (ICCIT);2023-12-13

4. Enhancing Tone Recognition in Large-Scale Social Media Data with Deep Learning and Big Data Processing;2023 5th International Conference on Artificial Intelligence and Computer Applications (ICAICA);2023-11-28

5. Genetic Algorithm for High-Dimensional Emotion Recognition from Speech Signals;Electronics;2023-11-25