Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning-Reference-Cited by-同舟云学术

Uni2Mul: A Conformer-Based Multimodal Emotion Classification Model by Considering Unimodal Expression Differences with Multi-Task Learning

Published:2023-09-01 Issue:17 Volume:13 Page:9910
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zhang Lihong¹,Liu Chaolong¹,Jia Nan¹

Affiliation:

1. Smart Policing Academy, China People’s Police University, Langfang 065000, China

Abstract

Multimodal emotion classification (MEC) has been extensively studied in human–computer interaction, healthcare, and other domains. Previous MEC research has utilized identical multimodal annotations (IMAs) to train unimodal models, hindering the learning of effective unimodal representations due to differences between unimodal expressions and multimodal perceptions. Additionally, most MEC fusion techniques fail to consider the unimodal–multimodal inconsistencies. This study addresses two important issues in MEC: learning satisfactory unimodal representations of emotion and accounting for unimodal–multimodal inconsistencies during the fusion process. To tackle these challenges, the authors propose the Two-Stage Conformer-based MEC model (Uni2Mul) with two key innovations: (1) in stage one, unimodal models are trained using independent unimodal annotations (IUAs) to optimize unimodal emotion representations; (2) in stage two, a Conformer-based architecture is employed to fuse the unimodal representations learned in stage one and predict IMAs, accounting for unimodal–multimodal differences. The proposed model is evaluated on the CH-SIMS dataset. The experimental results demonstrate that Uni2Mul outperforms baseline models. This study makes two key contributions: (1) the use of IUAs improves unimodal learning; (2) the two-stage approach addresses unimodal–multimodal inconsistencies during Conformer-based fusion. Uni2Mul advances MEC by enhancing unimodal representation learning and Conformer-based fusion.

Funder

Doctoral Research Innovation Program of China People’s Police University

2022 Humanities and Social Science Research Youth Foundation Project of Ministry of Education

Hebei Province Science and Technology Support Program

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/17/9910/pdf

Reference78 articles.

1. Lexicon-Based Methods for Sentiment Analysis;Taboada;Comput. Linguist.,2011

2. Sentiment Strength Detection for the Social Web;Thelwall;J. Am. Soc. Inf. Sci. Technol.,2012

3. Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., and Zafeiriou, S. (2016, January 20–25). Adieu Features? End-to-End Speech Emotion Recognition Using a Deep Convolutional Recurrent Network. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.

4. Expression Intensity, Gender and Facial Emotion Recognition: Women Recognize Only Subtle Facial Emotions Better than Men;Hoffmann;Acta Psychol.,2010

5. Audio-Visual Integration of Emotion Expression;Collignon;Brain Res.,2008

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. M2ER: Multimodal Emotion Recognition Based on Multi-Party Dialogue Scenarios;Applied Sciences;2023-10-16