Abstract
AbstractWith regard to human–machine interaction, accurate emotion recognition is a challenging problem. In this paper, efforts were taken to explore the possibility to complete the feature abstraction and fusion by the homogeneous network component, and propose a dual-modal emotion recognition framework that is composed of a parallel convolution (Pconv) module and attention-based bidirectional long short-term memory (BLSTM) module. The Pconv module employs parallel methods to extract multidimensional social features and provides more effective representation capacity. Attention-based BLSTM module is utilized to strengthen key information extraction and maintain the relevance between information. Experiments conducted on the CH-SIMS dataset indicate that the recognition accuracy reaches 74.70% on audio data and 77.13% on text, while the accuracy of the dual-modal fusion model reaches 90.02%. Through experiments it proves the feasibility to process heterogeneous information within homogeneous network component, and demonstrates that attention-based BLSTM module would achieve best coordination with the feature fusion realized by Pconv module. This can give great flexibility for the modality expansion and architecture design.
Funder
National Key Research and Development Program of China
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Deep GRU-BiLSTM Network for Multi-modal Emotion Recognition from Text;2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP);2024-07-11
2. Enhancing Text Recognition Performance Through Multi-Dimensional Data Analysis;2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN);2024-07-03
3. 3D facial animation driven by speech-video dual-modal signals;Complex & Intelligent Systems;2024-05-23
4. Study on intelligent assembly process planning and execution system based on digital twin;Robotic Intelligence and Automation;2024-04-30
5. Continuous Valence-Arousal Space Prediction and Recognition Based on Feature Fusion;2024 IEEE International Conference on Industrial Technology (ICIT);2024-03-25