Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism-Reference-Cited by-同舟云学术

Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism

Published:2023-10-23 Issue:20 Volume:12 Page:4376
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Mountzouris Konstantinos¹,Perikos Isidoros¹²^ORCID,Hatzilygeroudis Ioannis¹^ORCID

Affiliation:

1. Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece

2. Computer Technology Institute and Press “Diophantus”, 26504 Patras, Greece

Abstract

Speech emotion recognition (SER) is an interesting and difficult problem to handle. In this paper, we deal with it through the implementation of deep learning networks. We have designed and implemented six different deep learning networks, a deep belief network (DBN), a simple deep neural network (SDNN), an LSTM network (LSTM), an LSTM network with the addition of an attention mechanism (LSTM-ATN), a convolutional neural network (CNN), and a convolutional neural network with the addition of an attention mechanism (CNN-ATN), having in mind, apart from solving the SER problem, to test the impact of the attention mechanism on the results. Dropout and batch normalization techniques are also used to improve the generalization ability (prevention of overfitting) of the models as well as to speed up the training process. The Surrey Audio–Visual Expressed Emotion (SAVEE) database and the Ryerson Audio–Visual Database (RAVDESS) were used for the training and evaluation of our models. The results showed that the networks with the addition of the attention mechanism did better than the others. Furthermore, they showed that the CNN-ATN was the best among the tested networks, achieving an accuracy of 74% for the SAVEE database and 77% for the RAVDESS, and exceeding existing state-of-the-art systems for the same datasets.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/20/4376/pdf

Reference44 articles.

1. Wang, X., Zhang, Y., Yu, S., Liu, X., Yuan, Y., and Wang, F. (2017, January 5–8). E-learning recommendation framework based on deep learning. Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada.

2. Optimizing clinical trials recruitment via deep learning;Gligorijevic;J. Am. Med. Inform. Assoc.,2019

3. Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection;Davatzikos;NeuroImage,2005

4. Deep Learning for Video Game Playing;Justesen;IEEE Trans. Games,2020

5. Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., and Shchemelinin, V. (2017, January 20–24). Audio Replay Attack Detection with Deep Learning Frameworks. Proceedings of the Interspeech 2017, Stockholm, Sweden.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating IoMT and AI for Proactive Healthcare: Predictive Models and Emotion Detection in Neurodegenerative Diseases;Algorithms;2024-08-23

2. Audio-visual expression-based emotion recognition model for neglected people in real-time: a late-fusion approach;Multimedia Tools and Applications;2024-06-17

3. Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition;Engineering, Technology & Applied Science Research;2024-04-02