A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition-Reference-Cited by-同舟云学术

A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition

Published:2022-12-16 Issue:12 Volume:24 Page:1836
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Liu Fen^ORCID,Chen Jianfeng^ORCID,Li Kemeng,Tan Weijie^ORCID,Cai Chang,Ayub Muhammad Saad

Abstract

Multi-modal fusion can exploit complementary information from various modalities and improve the accuracy of prediction or classification tasks. In this paper, we propose a parallel, multi-modal, factorized, bilinear pooling method based on a semi-tensor product (STP) for information fusion in emotion recognition. Initially, we apply the STP to factorize a high-dimensional weight matrix into two low-rank factor matrices without dimension matching constraints. Next, we project the multi-modal features to the low-dimensional matrices and perform multiplication based on the STP to capture the rich interactions between the features. Finally, we utilize an STP-pooling method to reduce the dimensionality to get the final features. This method can achieve the information fusion between modalities of different scales and dimensions and avoids data redundancy due to dimension matching. Experimental verification of the proposed method on the emotion-recognition task using the IEMOCAP and CMU-MOSI datasets showed a significant reduction in storage space and recognition time. The results also validate that the proposed method improves the performance and reduces both the training time and the number of parameters.

Funder

Natural Science Foundation of Shaanxi Province

Yan’an University Scientific Research Project

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/12/1836/pdf

Reference40 articles.

1. Multimodal machine learning: A survey and taxonomy;Ahuja;IEEE Trans. Pattern Anal. Mach. Intell.,2018

2. VideoStory Embeddings Recognize Events when Examples are Scarce;Habibian;IEEE Trans. Pattern Anal. Mach. Intell.,2016

3. Shuang, W., Bondugula, S., Luisier, F., Zhuang, X., and Natarajan, P. (2014, January 23–28). Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

4. Park, S., Han, S.S., Chatterjee, M., Sagae, K., and Morency, L.P. (2014, January 12–16). Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach. Proceedings of the 16th International Conference on Multimodal Interaction, New York, NY, USA.

5. Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency, L.P. (2017). Tensor Fusion Network for Multimodal Sentiment Analysis. arXiv.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation;Sensors;2024-07-17

2. STP-MFM: Semi-tensor product-based multi-modal factorized multilinear pooling for information fusion in sentiment analysis;Digital Signal Processing;2024-02

3. A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face;Entropy;2023-10-12