Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention-Reference-Cited by-同舟云学术

Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention

Published:2023-01-26 Issue:3 Volume:23 Page:1386
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Alsabhan Waleed¹

Affiliation:

1. College of Engineering, Al Faisal University, P.O. Box 50927, Riyadh 11533, Saudi Arabia

Abstract

Emotions have a crucial function in the mental existence of humans. They are vital for identifying a person’s behaviour and mental condition. Speech Emotion Recognition (SER) is extracting a speaker’s emotional state from their speech signal. SER is a growing discipline in human–computer interaction, and it has recently attracted more significant interest. This is because there are not so many universal emotions; therefore, any intelligent system with enough computational capacity can educate itself to recognise them. However, the issue is that human speech is immensely diverse, making it difficult to create a single, standardised recipe for detecting hidden emotions. This work attempted to solve this research difficulty by combining a multilingual emotional dataset with building a more generalised and effective model for recognising human emotions. A two-step process was used to develop the model. The first stage involved the extraction of features, and the second stage involved the classification of the features that were extracted. ZCR, RMSE, and the renowned MFC coefficients were retrieved as features. Two proposed models, 1D CNN combined with LSTM and attention and a proprietary 2D CNN architecture, were used for classification. The outcomes demonstrated that the suggested 1D CNN with LSTM and attention performed better than the 2D CNN. For the EMO-DB, SAVEE, ANAD, and BAVED datasets, the model’s accuracy was 96.72%, 97.13%, 96.72%, and 88.39%, respectively. The model beat several earlier efforts on the same datasets, demonstrating the generality and efficacy of recognising multiple emotions from various languages.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/3/1386/pdf

Reference52 articles.

1. Darwin, C., and Prodger, P. (1998). The Expression of the Emotions in Man and Animals, Oxford University Press.

2. The importance of being emotional;Oatley;New Sci.,1989

3. Survey on speech emotion recognition: Features, classification schemes, and databases;Kamel;Pattern Recognit.,2011

4. Detection and analysis of emotion from speech signals;Davletcharova;Procedia Comput. Sci.,2015

5. Harár, P., Burget, R., and Dutta, M.K. (2017, January 2–3). Speech emotion recognition with deep learning. Proceedings of the 2017 4th International conference on signal processing and integrated networks (SPIN), Noida, India.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Speech Emotion Recognition with Hilbert Curve and convolutional neural network;Cognitive Robotics;2024

2. A review of deep learning techniques for speech processing;Information Fusion;2023-11

3. Adaptive Dimensional Gaussian Mutation of PSO-Optimized Convolutional Neural Network Hyperparameters;Applied Sciences;2023-03-27