Multi-Modal Residual Perceptron Network for Audio–Video Emotion Recognition-Reference-Cited by-同舟云学术

Multi-Modal Residual Perceptron Network for Audio–Video Emotion Recognition

Published:2021-08-12 Issue:16 Volume:21 Page:5452
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Chang Xin^ORCID,Skarbek Władysław^ORCID

Abstract

Emotion recognition is an important research field for human–computer interaction. Audio–video emotion recognition is now attacked with deep neural network modeling tools. In published papers, as a rule, the authors show only cases of the superiority in multi-modality over audio-only or video-only modality. However, there are cases of superiority in uni-modality that can be found. In our research, we hypothesize that for fuzzy categories of emotional events, the within-modal and inter-modal noisy information represented indirectly in the parameters of the modeling neural network impedes better performance in the existing late fusion and end-to-end multi-modal network training strategies. To take advantage of and overcome the deficiencies in both solutions, we define a multi-modal residual perceptron network which performs end-to-end learning from multi-modal network branches, generalizing better multi-modal feature representation. For the proposed multi-modal residual perceptron network and the novel time augmentation for streaming digital movies, the state-of-the-art average recognition rate was improved to 91.4% for the Ryerson Audio–Visual Database of Emotional Speech and Song dataset and to 83.15% for the Crowd-Sourced Emotional Multi Modal Actors dataset. Moreover, the multi-modal residual perceptron network concept shows its potential for multi-modal applications dealing with signal sources not only of optical and acoustical types.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/16/5452/pdf

Reference39 articles.

1. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

2. Gradient-based learning applied to document recognition

3. Long Short-Term Memory

4. Attention Is All You Need;Vaswani;arXiv,2017

5. ModDrop: Adaptive Multi-Modal Gesture Recognition

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel signal channel attention network for multi-modal emotion recognition;Frontiers in Neurorobotics;2024-09-11

2. A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCN;International Journal of Decision Support System Technology;2024-08-16

3. Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture;Applied Artificial Intelligence;2024-05-21

4. Deep Learning Approaches for Effective Human Computer Interaction: A Comprehensive Survey on Single and Multimodal Emotion Detection;2024 IEEE 9th International Conference for Convergence in Technology (I2CT);2024-04-05

5. Cross Entropy in Deep Learning of Classifiers Is Unnecessary—ISBE Error Is All You Need;Entropy;2024-01-12