Affiliation:
1. Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, P.R. China
2. Department of Computer Science, Aalto University, Espoo, Finland
3. Faculty of Social Sciences and Liberal Arts, University College Sedaya International, Kuala Lumpur, Malaysia
4. Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Abstract
The recent booming of artificial intelligence (AI) applications, e.g., affective robots, human-machine interfaces, autonomous vehicles, and so on, has produced a great number of multi-modal records of human communication. Such data often carry latent subjective users’ attitudes and opinions, which provides a practical and feasible path to realize the connection between human emotion and intelligence services. Sentiment and emotion analysis of multi-modal records is of great value to improve the intelligence level of affective services. However, how to find an optimal manner to learn people’s sentiments and emotional representations has been a difficult problem, since both of them involve subtle mind activity. To solve this problem, a lot of approaches have been published, but most of them are insufficient to mine sentiment and emotion, since they have treated sentiment analysis and emotion recognition as two separate tasks. The interaction between them has been neglected, which limits the efficiency of sentiment and emotion representation learning. In this work, emotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. We thus argue that they are strongly related to each other where one’s judgment helps the decision of the other. The key challenges are multi-modal fused representation and the interaction between sentiment and emotion. To solve such issues, we design an external knowledge enhanced multi-task representation learning network, termed KAMT. The major elements contain two attention mechanisms, which are inter-modal and inter-task attentions and an external knowledge augmentation layer. The external knowledge augmentation layer is used to extract the vector of the participant’s gender, age, occupation, and of overall color or shape. The main use of inter-modal attention is to capture effective multi-modal fused features. Inter-task attention is designed to model the correlation between sentiment analysis and emotion classification. We perform experiments on three widely used datasets, and the experimental performance proves the effectiveness of the KAMT model.
Funder
King Saud University, Riyadh, Saudi Arabia
National Science Foundation of China
Fund of State Key Lab. for Novel Software Technology in Nanjing University
Industrial Science and Technology Research Project of Henan Province
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献