Cross-Modal Sentiment Analysis of Text and Video Based on Bi-GRU Cyclic Network and Correlation Enhancement-Reference-Cited by-同舟云学术

Cross-Modal Sentiment Analysis of Text and Video Based on Bi-GRU Cyclic Network and Correlation Enhancement

Published:2023-06-25 Issue:13 Volume:13 Page:7489
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

He Ping¹,Qi Huaying¹,Wang Shiyi¹,Cang Jiayue¹

Affiliation:

1. School of Information Technology, Hebei University of Economics and Business, Shijiazhuang 050061, China

Abstract

Cross-modal sentiment analysis is an emerging research area in natural language processing. The core task of cross-modal fusion lies in cross-modal relationship extraction and joint feature learning. The existing research methods of cross-modal sentiment analysis focus on static text, video, audio, and other modality data but ignore the fact that different modality data are often unaligned in practical applications. There is a long-term time dependence among unaligned data sequences, and it is difficult to explore the interaction between different modalities. The paper proposes a sentiment analysis model (UA-BFET) based on feature enhancement technology in unaligned data scenarios, which can perform sentiment analysis on unaligned text and video modality data in social media. Firstly, the model adds a cyclic memory enhancement network across time steps. Then, the obtained cross-modal fusion features with interaction are applied to the unimodal feature extraction process of the next time step in the Bi-directional Gated Recurrent Unit (Bi-GRU) so that the progressively enhanced unimodal features and cross-modal fusion features continuously complement each other. Secondly, the extracted unimodal text and video features taken jointly from the enhanced cross-modal fusion features are subjected to canonical correlation analysis (CCA) and input into the fully connected layer and Softmax function for sentiment analysis. Through experiments executed on unaligned public datasets MOSI and MOSEI, the UA-BFET model has achieved or even exceeded the sentiment analysis effect of text, video, and audio modality fusion and has outstanding advantages in solving cross-modal sentiment analysis in unaligned data scenarios.

Funder

Scientific Research and Development Program Project of the Hebei University of Economics and Business

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/13/7489/pdf

Reference42 articles.

1. A survey on sentiment classification;Chen;J. Comput. Res. Dev.,2017

2. The bigger picture: Combining econometrics with analytics improve forecasts of movie success;Lehrer;Manag. Sci.,2022

3. O’Connor, B., Balasubramanyan, R., Routledgeb, B.R., and Smith, N.A. (2010, January 23–26). From tweets to polls: Linking text sentiment to public opinion time series. Proceedings of the Fourth International Conference on Weblogs and Social Media, Washington, DC, USA.

4. Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., and Morency, L.-P. (August, January 30). Context-dependent sentiment analysis in user-generated videos. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada.

5. Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., and Morency, L.-P. (2018, January 2–7). Memory fusion network for multi-view sequential learning. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.