Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques-Reference-Cited by-同舟云学术

Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques

Published:2023-09-28 Issue:3-4 Volume:2023 Page:375-387
ISSN:1453-8245
Container-title:Romanian Journal of Information Science and Technology
language:
Short-container-title:ROMJIST

Author:

MIHALACHE Serban, ,BURILEANU Dragos, ,

Abstract

Speech emotion recognition (SER) is the task of determining the affective content present in speech, a promising research area of great interest in recent years, with important applications especially in the field of forensic speech and law enforcement operations, among others. In this paper, systems based on deep neural networks (DNNs) spanning five levels of complexity are proposed, developed, and tested, including systems leveraging transfer learning (TL) for the top modern image recognition deep learning models, as well as several ensemble classification techniques that lead to significant performance increases. The systems were tested on the most relevant SER datasets: EMODB, CREMAD, and IEMOCAP, in the context of: (i) classification: using the standard full sets of emotion classes, as well as additional negative emotion subsets relevant for forensic speech applications; and (ii) regression: using the continuously valued 2D arousal-valence affect space. The proposed systems achieved state-of-the-art results for the full class subset for EMODB (up to 83% accuracy) and performance comparable to other published research for the full class subsets for CREMAD and IEMOCAP (up to 55% and 62% accuracy). For the class subsets focusing only on negative affective content, the proposed solutions offered top performance vs. previously published state of the art results.

Publisher

Editura Academiei Romane

Subject

General Computer Science

Reference32 articles.

1. "[1] S. MIHALACHE, D. BURILEANU, G. POP, and C. BURILEANU, Modulation-based SER with reconstruction error feature expansion, Proceedings of International Conference on Speech Technology and Human-Computer Dialogue, Timisoara, Romania, pp. 1 6, 2019.

2. [2] T. RAHMAN and C. BUSSO, A personalized emotion recognition system using an unsupervised feature adaptation scheme, Proceedings International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, pp. 5117 5120, 2012.

3. [3] B.T. ATMAJA and M. AKAGI, Deep multilayer perceptrons for dimensional speech emotion recognition, Proceedings Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Auckland, New Zealand, pp. 325-331, 2020.

4. [4] W. RAO, Z.H. LIM, Q. WANG, C. XU, X. TIAN, E.S. CHNG, and H. LI, Investigation of fixed-dimensional speech representations for real-time speech emotion recognition system, Proceedings International Conference on Orange Technologies, Singapore, pp. 197 200, 2017.

5. [5] E. GHALEB, M. POPA, and S. ASTERIADIS, Multimodal and temporal perception of audio-visual cues for emotion recognition, Proceedings International Conference on Affective Computing and Intelligent Interaction, Cambridge, UK, pp. 552-558, 2019.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Class-specific feature selection using fuzzy information-theoretic metrics;Engineering Applications of Artificial Intelligence;2024-10

2. Empirical mode decomposition-based biometric identification using GRU and LSTM deep neural networks on ECG signals;Evolving Systems;2024-08-22

3. EAFNet: Extraction-amplification-fusion network for tiny cracks detection;Engineering Applications of Artificial Intelligence;2024-08

4. Contrastive and adversarial regularized multi-level representation learning for incomplete multi-view clustering;Neural Networks;2024-04

5. Cooperative Coverage Path Planning for Multi-Mobile Robots Based on Improved K-Means Clustering and Deep Reinforcement Learning;Electronics;2024-02-29