Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning-Reference-Cited by-同舟云学术

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Published:2021-11-18 Issue:22 Volume:21 Page:7665
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Luna-Jiménez Cristina^ORCID,Griol David^ORCID,Callejas Zoraida^ORCID,Kleinlein Ricardo^ORCID,Montero Juan M.^ORCID,Fernández-Martínez Fernando^ORCID

Abstract

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Funder

Ministerio de Economía, Industria y Competitividad, Gobierno de España

Ministerio de Educación, Cultura y Deporte

European Commission

Agencia Estatas de Investigación

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/22/7665/pdf

Reference80 articles.

1. The Role of Trust in Proactive Conversational Assistants

2. Embodied Conversational Agents;Cassell,2000

3. From ‘automation’ to ‘autonomy’: the importance of trust repair in human–machine interaction

4. An Emotion Recognition–Awareness Vulnerability Hypothesis for Depression in Adolescence: A Systematic Review

5. Discriminative Power of EEG-Based Biomarkers in Major Depressive Disorder: A Systematic Review

Cited by 65 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimized efficient attention-based network for facial expressions analysis in neurological health care;Computers in Biology and Medicine;2024-09

2. Deep operational audio-visual emotion recognition;Neurocomputing;2024-07

3. Using transformers for multimodal emotion recognition: Taxonomies and state of the art review;Engineering Applications of Artificial Intelligence;2024-07

4. Enhancing Emotion Recognition through Multimodal Systems and Advanced Deep Learning Techniques;International Journal of Scientific Research in Computer Science, Engineering and Information Technology;2024-06-27

5. Mental-Health Topic Classification employing D-vectors of Large Language Models;2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS);2024-06-26