Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network-Reference-Cited by-同舟云学术

Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

Published:2023-08-18 Issue:16 Volume:12 Page:3504
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Huang Ju¹²,Lu Pengtao¹²,Sun Shuifa³,Wang Fangyi¹²^ORCID

Affiliation:

1. Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China

2. College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China

3. College of Information Science and Technology, Hangzhou Normal University, Hangzhou 310000, China

Abstract

In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. When processing the noise in data, existing methods utilize the combination of multimodal features to mitigate errors in sentiment word recognition caused by the performance limitations of automatic speech recognition (ASR) models. However, there still remains the problem of how to more efficiently utilize and combine different modalities to address the data noise. In multimodal fusion, most existing fusion methods have limited adaptability to the feature differences between modalities, making it difficult to capture the potential complex nonlinear interactions that may exist between modalities. To overcome the aforementioned issues, this paper proposes a new framework named multimodal-word-refinement and cross-modal-hierarchy (MWRCMH) fusion. Specifically, we utilized a multimodal word correction module to reduce sentiment word recognition errors caused by ASR. During multimodal fusion, we designed a cross-modal hierarchical fusion module that employed cross-modal attention mechanisms to fuse features between pairs of modalities, resulting in fused bimodal-feature information. Then, the obtained bimodal information and the unimodal information were fused through the nonlinear layer to obtain the final multimodal sentiment feature information. Experimental results on the MOSI-SpeechBrain, MOSI-IBM, and MOSI-iFlytek datasets demonstrated that the proposed approach outperformed other comparative methods, achieving Has0-F1 scores of 76.43%, 80.15%, and 81.93%, respectively. Our approach exhibited better performance, as compared to multiple baselines.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/16/3504/pdf

Reference52 articles.

1. A survey of identity recognition via data fusion and feature learning;Qin;Inf. Fusion,2023

2. Tu, G., Liang, B., Jiang, D., and Xu, R.J.I.T.o.A.C. (2022). Sentiment-Emotion-and Context-guided Knowledge Selection Framework for Emotion Recognition in Conversations. IEEE Trans. Affect. Comput., 1–14.

3. Survey on emotional body gesture recognition;Noroozi;IEEE Trans. Affect. Comput.,2018

4. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv.

5. Yue, W., and Li, L. (2020, January 14–16). Sentiment analysis using Word2vec-CNN-BiLSTM classification. Proceedings of the 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Paris, France.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Textual Context guided Vision Transformer with Rotated Multi-Head Attention for Sentiment Analysis;Companion Proceedings of the ACM Web Conference 2024;2024-05-13

2. Design and Efficacy of a Data Lake Architecture for Multimodal Emotion Feature Extraction in Social Media;IET Software;2024-03-08

3. Multimodal sentiment analysis leveraging the strength of deep neural networks enhanced by the XGBoost classifier;Computer Methods in Biomechanics and Biomedical Engineering;2024-02-10

4. Performance Analysis of Sentiment Fusion Network for Social Media Services;2023 International Conference on Communication, Security and Artificial Intelligence (ICCSAI);2023-11-23