A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images-Reference-Cited by-同舟云学术

A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images

Published:2020-08-06 Issue:3 Volume:4 Page:46
ISSN:2414-4088
Container-title:Multimodal Technologies and Interaction
language:en
Short-container-title:MTI

Author:

Siddiqui Mohammad Faridul Haque,Javaid Ahmad Y.^ORCID

Abstract

The exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The contribution involves implementing an ensemble-based approach for the AER through the fusion of visible images and infrared (IR) images with speech. The framework is implemented in two layers, where the first layer detects emotions using single modalities while the second layer combines the modalities and classifies emotions. Convolutional Neural Networks (CNN) have been used for feature extraction and classification. A hybrid fusion approach comprising early (feature-level) and late (decision-level) fusion, was applied to combine the features and the decisions at different stages. The output of the CNN trained with voice samples of the RAVDESS database was combined with the image classifier’s output using decision-level fusion to obtain the final decision. An accuracy of 86.36% and similar recall (0.86), precision (0.88), and f-measure (0.87) scores were obtained. A comparison with contemporary work endorsed the competitiveness of the framework with the rationale for exclusivity in attaining this accuracy in wild backgrounds and light-invariant conditions.

Publisher

MDPI AG

Subject

Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Neuroscience (miscellaneous)

Link

https://www.mdpi.com/2414-4088/4/3/46/pdf

Reference122 articles.

1. Facial Action Coding System;Ekman,1977

2. Facial Action Coding System: The Manual on CD ROM;Ekman,2002

3. FACS investigator’s guide;Ekman,2002

4. EmoNets: Multimodal deep learning approaches for emotion recognition in video

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Portable Facial Expression System Based on EMG Sensors and Machine Learning Models;Sensors;2024-05-23

2. Multimodal Emotion Recognition via Convolutional Neural Networks: Comparison of different strategies on two multimodal datasets;Engineering Applications of Artificial Intelligence;2024-04

3. Optimized Ensemble Machine Learning Approach for Emotion Detection from Thermal Images;International Journal of Pattern Recognition and Artificial Intelligence;2024-02

4. Multimodal Daily-Life Emotional Recognition Using Heart Rate and Speech Data From Wearables;IEEE Access;2024

5. Facial Expression Recognition Using Visible, IR, and MSX Images by Early and Late Fusion of Deep Learning Models;IEEE Access;2024