Abstract
Visual question answering (VQA) is regarded as a multi-modal fine-grained feature fusion task, which requires the construction of multi-level and omnidirectional relations between nodes. One main solution is the composite attention model which is composed of co-attention (CA) and self-attention(SA). However, the existing composite models only consider the stack of single attention blocks, lack of path-wise historical memory, and overall adjustments. We propose a path attention memory network (PAM) to construct a more robust composite attention model. After each single-hop attention block (SA or CA), the importance of the cumulative nodes is used to calibrate the signal strength of nodes’ features. Four memoried single-hop attention matrices are used to obtain the path-wise co-attention matrix of path-wise attention (PA); therefore, the PA block is capable of synthesizing and strengthening the learning effect on the whole path. Moreover, we use guard gates of the target modal to check the source modal values in CA and conditioning gates of another modal to guide the query and key of the current modal in SA. The proposed PAM is beneficial to construct a robust multi-hop neighborhood relationship between visual and language and achieves excellent performance on both VQA2.0 and VQA-CP V2 datasets.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Hunan Province
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference63 articles.
1. Robust Deep Multi-Modal Learning Based on Gated Information Fusion Network;Kim;Proceedings of the Asian Conference on Computer Vision,2018
2. Unpaired Multi-Modal Segmentation via Knowledge Distillation
3. Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
4. MDETR-modulated detection for end-to-end multi-modal understanding;Kamath;Proceedings of the IEEE/CVF International Conference on Computer Vision,2021
5. Robust Sparse Weighted Classification For Crowdsourcing
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献