M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition-Reference-Cited by-同舟云学术

M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

Published:2023-08-21 Issue:1 Volume:42 Page:1-32
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Zhang Yazhou¹^ORCID,Jia Ao²^ORCID,Wang Bo³^ORCID,Zhang Peng³^ORCID,Zhao Dongming⁴^ORCID,Li Pu¹^ORCID,Hou Yuexian³^ORCID,Jin Xiaojia⁴^ORCID,Song Dawei²^ORCID,Qin Jing⁵^ORCID

Affiliation:

1. Software Engineering College, Zhengzhou University of Light Industry, China

2. School of Computer Science and Technology, Beijing Institute of Technology, China

3. College of Intelligence and Computing, Tianjin University, China

4. Artificial Intelligence Laboratory, China Mobile Communication Group Tianjin Co., Ltd., China

5. Centre for Smart Health, School of Nursing, Hong Kong Polytechnic University, China

Abstract

Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.

Funder

The Hong Kong Polytechnic University

National Science Foundation of China

Novel Software Technology in Nanjing University

Industrial Science and Technology Research Project of Henan Province

Foundation of Key Laboratory of Dependable Service Computing in Cyber-Physical-Society (Ministry of Education), Chongqing University

Natural Science Foundation of Henan

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3593583

Reference59 articles.

1. Md Shad Akhtar Dushyant Singh Chauhan Deepanway Ghosal Soujanya Poria Asif Ekbal and Pushpak Bhattacharyya. 2019. Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv preprint arXiv:1905.05812 (2019).

2. Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis

3. Ze-Jing Chuang and Chung-Hsien Wu. 2004. Multi-modal emotion recognition from speech and text. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 2, August 2004: Special Issue on New Trends of Speech and Language Processing. 45–62.

4. Elizabeth M. Daly and Mads Haahr. 2008. Social network analysis for information flow in disconnected delay-tolerant MANETs. IEEE Transactions on Mobile Computing 8 5 (2008) 606–621.

5. Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-01-11

2. Moving From Narrative to Interactive Multi-Modal Sentiment Analysis: A Survey;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-07-22