Optimizing Speech Emotion Recognition with Deep Learning and Grey Wolf Optimization: A Multi-Dataset Approach-Reference-Cited by-同舟云学术

Optimizing Speech Emotion Recognition with Deep Learning and Grey Wolf Optimization: A Multi-Dataset Approach

Published:2024-02-20 Issue:3 Volume:17 Page:90
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Tyagi Suryakant¹,Szénási Sándor²³^ORCID

Affiliation:

1. Doctoral School of Applied Informatics and Applied Mathematics, Óbuda University, 1034 Budapest, Hungary

2. John von Neumann Faculty of Informatics, Óbuda University, 1034 Budapest, Hungary

3. Faculty of Economics and Informatics, J. Selye University, 945 01 Komarno, Slovakia

Abstract

Machine learning and speech emotion recognition are rapidly evolving fields, significantly impacting human-centered computing. Machine learning enables computers to learn from data and make predictions, while speech emotion recognition allows computers to identify and understand human emotions from speech. These technologies contribute to the creation of innovative human–computer interaction (HCI) applications. Deep learning algorithms, capable of learning high-level features directly from raw data, have given rise to new emotion recognition approaches employing models trained on advanced speech representations like spectrograms and time–frequency representations. This study introduces CNN and LSTM models with GWO optimization, aiming to determine optimal parameters for achieving enhanced accuracy within a specified parameter set. The proposed CNN and LSTM models with GWO optimization underwent performance testing on four diverse datasets—RAVDESS, SAVEE, TESS, and EMODB. The results indicated superior performance of the models compared to linear and kernelized SVM, with or without GWO optimizers.

Publisher

MDPI AG

Link

https://www.mdpi.com/1999-4893/17/3/90/pdf

Reference50 articles.

1. Acoustic profiles in vocal emotion expression;Banse;J. Personal. Soc. Psychol.,1996

2. Speech emotion recognition research: An analysis of research focus;Mustafa;Int. J. Speech Technol.,2018

3. Schuller, B., Rigoll, G., and Lang, M. (2003, January 6–10). Hidden markov model-based speech emotion recognition. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, China.

4. Hu, H., Xu, M.-X., and Wu, W. (2007, January 15–20). GMM supervector based SVM with spectral features for speech emotion recognition. Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, HI, USA.

5. Emotion recognition using a hierarchical binary decision tree approach;Lee;Speech Commun.,2009