Emotion-Recognition Algorithm Based on Weight-Adaptive Thought of Audio and Video-Reference-Cited by-同舟云学术

Emotion-Recognition Algorithm Based on Weight-Adaptive Thought of Audio and Video

Published:2023-06-05 Issue:11 Volume:12 Page:2548
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Cheng Yongjian¹,Zhou Dongmei¹,Wang Siqi¹,Wen Luhan¹

Affiliation:

1. School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, China

Abstract

Emotion recognition commonly relies on single-modal recognition methods, such as voice and video signals, which demonstrate a good practicability and universality in some scenarios. Nevertheless, as emotion-recognition application scenarios continue to expand and the data volume surges, single-modal emotion recognition proves insufficient to meet people’s needs for accuracy and comprehensiveness when the amount of data reaches a certain scale. Thus, this paper proposes the application of multimodal thought to enhance emotion-recognition accuracy and conducts corresponding data preprocessing on the selected dataset. Appropriate models are constructed for both audio and video modalities: for the audio-modality emotion-recognition task, this paper adopts the “time-distributed CNNs + LSTMs” model construction scheme; for the video-modality emotion-recognition task, the “DeepID V3 + Xception architecture” model construction scheme is selected. Furthermore, each model construction scheme undergoes experimental verification and comparison with existing emotion-recognition algorithms. Finally, this paper attempts late fusion and proposes and implements a late-fusion method based on the idea of weight adaptation. The experimental results demonstrate the superiority of the multimodal fusion algorithm proposed in this paper. When compared to the single-modal emotion-recognition algorithm, the accuracy of recognition is increased by almost 4%, reaching 84.33%.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/11/2548/pdf

Reference53 articles.

1. Picard, R.W. (2000). Affective Computing, MIT Press.

2. Speech emotion recognition from 3D log-mel spectrograms with deep learning network;Meng;IEEE Access,2019

3. Atsavasirilert, K., Theeramunkong, T., Usanavasin, S., Rugchatjaroen, A., Boonkla, S., Karnjana, J., Keerativittayanun, S., and Okumura, M. (November, January 30). A light-weight deep convolutional neural network for speech emotion recognition using mel-spectrograms. Proceedings of the 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Chiang Mai, Thailand.

4. Speech Emotion Recognition using Time Distributed CNN and LSTM;Salian;ITM Web Conf.,2021

5. Mao, K., Zhang, W., Wang, D.B., Li, A., Jiao, R., Zhu, Y., Wu, B., Zheng, T., Qian, L., and Lyu, W. (2022). Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN. IEEE Trans. Affect. Comput.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A classroom facial expression recognition method based on attention mechanism;Journal of Intelligent & Fuzzy Systems;2023-12-02

2. Drivers’ Comprehensive Emotion Recognition Based on HAM;Sensors;2023-10-07