Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning-Reference-Cited by-同舟云学术

Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning

Published:2023-06-07 Issue:5s Volume:19 Page:1-25
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Liu Hao¹^ORCID,Yang Xiaoshan²^ORCID,Xu Changsheng²^ORCID

Affiliation:

1. State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA), China and School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), China

2. State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA), China, and School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), China, and Peng Cheng Laboratory (PCL), China

Abstract

Multi-modal video emotion reasoning (MERV) has recently attracted increasing attention due to its potential application in human-computer interaction. This task needs to not only recognize utterance-level emotions for conspicuous speakers, but also perceive the emotions of non-speakers in videos. Existing methods focus on modeling multi-modal multi-level contexts to capture emotion-relevant clues from the complex scenarios in videos. However, the context information is far from enough to infer the emotion labels of non-speakers due to the large gap between the scenario situation and emotions labels. Inspired by the observation that humans can find solutions to complex problems with the leverage of experience and knowledge, we propose SK-MER , a Scenario-relevant Knowledge-enhanced Multi-modal Emotion Reasoning framework for MERV task, which can leverage external knowledge to enhance the video scenario understanding and emotion reasoning. Specifically, we use scenario concepts extracted from videos to build knowledge subgraphs from external knowledge bases. The knowledge subgraphs are then utilized to obtain scenario-relevant knowledge representations through dynamic knowledge graph attention. Next, we incorporate the knowledge representations into context modeling to enhance emotion reasoning with external scenario-relevant knowledge. In addition, we propose a counterfactual knowledge representation learning approach to obtain more effective scenario-relevant knowledge representations. Extensive experimental results on MEmoR dataset show that the proposed SK-MER framework achieves new state-of-the-art results.

Funder

National Natural Science Foundation of China

Beijing Natural Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3583690

Reference79 articles.

1. DBpedia: A Nucleus for a Web of Open Data

2. HighlightMe: Detecting Highlights from Human-Centric Videos

3. SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings

4. VGGFace2: A Dataset for Recognising Faces across Pose and Age

5. Emotion in Context

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Knowledge-integrated Multi-modal Movie Turning Point Identification;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-01-22