Affiliation:
1. Graduate Program in Electrical Engineering, Federal University of Bahia, Salvador 40210-910, Brazil
2. Technologic and Exact Center, Federal University of Recôncavo da Bahia, Cruz das Almas 44380-000, Brazil
3. Department of Electrical and Computer Engineering, Federal University of Bahia, Salvador 40210-910, Brazil
Abstract
Speech emotion recognition (SER) is widely applicable today, benefiting areas such as entertainment, robotics, and healthcare. This emotional understanding enhances user-machine interaction, making systems more responsive and providing more natural experiences. In robotics, SER is useful in home assistance devices, eldercare, and special education, facilitating effective communication. Additionally, in healthcare settings, it can monitor patients’ emotional well-being. However, achieving high levels of accuracy is challenging and complicated by the need to select the best combination of machine learning algorithms, hyperparameters, datasets, data augmentation, and feature extraction methods. Therefore, this study aims to develop a deep learning approach for optimal SER configurations. It delves into the domains of optimizer settings, learning rates, data augmentation techniques, feature extraction methods, and neural architectures for the RAVDESS, TESS, SAVEE, and R+T+S (RAVDESS+TESS+SAVEE) datasets. After finding the best SER configurations, meta-learning is carried out, transferring the best configurations to two additional datasets, CREMA-D and R+T+S+C (RAVDESS+TESS+SAVEE+CREMA-D). The developed approach proved effective in finding the best configurations, achieving an accuracy of 97.01% for RAVDESS, 100% for TESS, 90.62% for SAVEE, and 97.37% for R+T+S. Furthermore, using meta-learning, the CREMA-D and R+T+S+C datasets achieved accuracies of 83.28% and 90.94%, respectively.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference40 articles.
1. Ottoni, L.T.C., and Cerqueira, J.J.F. (2021, January 11–15). A Review of Emotions in Human-Robot Interaction. Proceedings of the 2021 Latin American Robotics Symposium (LARS), Natal, Brazil.
2. Simulation of an Artificial Hearing Module for an Assistive Robot;Oliveira;Adv. Intell. Syst. Comput.,2019
3. Martins, P.S., Faria, G., and Cerqueira, J.J.F. (2020). I2E: A Cognitive Architecture Based on Emotions for Assistive Robotics Applications. Electronics, 9.
4. Baek, J.Y., and Lee, S.P. (2023). Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation. Electronics, 12.
5. Adazd-Net: Automated adaptive and explainable Alzheimer’s disease detection system using EEG signals;Khare;Knowl.-Based Syst.,2023
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献