Reasoning with Heterogeneous Graph Alignment for Video Question Answering-Reference-Cited by-同舟云学术

Reasoning with Heterogeneous Graph Alignment for Video Question Answering

Published:2020-04-03 Issue:07 Volume:34 Page:11109-11116
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Jiang Pin,Han Yahong

Abstract

The dominant video question answering methods are based on fine-grained representation or model-specific attention mechanism. They usually process video and question separately, then feed the representations of different modalities into following late fusion networks. Although these methods use information of one modality to boost the other, they neglect to integrate correlations of both inter- and intra-modality in an uniform module. We propose a deep heterogeneous graph alignment network over the video shots and question words. Furthermore, we explore the network architecture from four steps: representation, fusion, alignment, and reasoning. Within our network, the inter- and intra-modality information can be aligned and interacted simultaneously over the heterogeneous graph and used for cross-modal reasoning. We evaluate our method on three benchmark datasets and conduct extensive ablation study to the effectiveness of the network architecture. Experiments show the network to be superior in quality.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 66 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Appearance-Motion Dual-Stream Heterogeneous Network for VideoQA;MultiMedia Modeling;2024

2. VCD: Visual Causality Discovery for Cross-Modal Question Reasoning;Pattern Recognition and Computer Vision;2023-12-25

3. Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question Answering;ACM Transactions on Multimedia Computing, Communications, and Applications;2023-12-11

4. Spatio-Temporal Two-stage Fusion for video question answering;Computer Vision and Image Understanding;2023-12

5. Multi-Granularity Interaction and Integration Network for Video Question Answering;IEEE Transactions on Circuits and Systems for Video Technology;2023-12