Differential Networks for Visual Question Answering-Reference-Cited by-同舟云学术

Differential Networks for Visual Question Answering

Published:2019-07-17 Issue: Volume:33 Page:8997-9004
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wu Chenfei,Liu Jinlai,Wang Xiaojie,Li Ruifan

Abstract

The task of Visual Question Answering (VQA) has emerged in recent years for its potential applications. To address the VQA task, the model should fuse feature elements from both images and questions efficiently. Existing models fuse image feature element vi and question feature element qi directly, such as an element product viqi. Those solutions largely ignore the following two key points: 1) Whether vi and qi are in the same space. 2) How to reduce the observation noises in vi and qi. We argue that two differences between those two feature elements themselves, like (vi − vj) and (qi −qj), are more probably in the same space. And the difference operation would be beneficial to reduce observation noise. To achieve this, we first propose Differential Networks (DN), a novel plug-and-play module which enables differences between pair-wise feature elements. With the tool of DN, we then propose DN based Fusion (DF), a novel model for VQA task. We achieve state-of-the-art results on four publicly available datasets. Ablation studies also show the effectiveness of difference operations in DF model.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ArabicQuest: Enhancing Arabic Visual Question Answering with LLM Fine-Tuning;2024 Intelligent Methods, Systems, and Applications (IMSA);2024-07-13

2. A Systematic Evaluation of GPT-4V’s Multimodal Capability for Chest X-ray Image Analysis;Meta-Radiology;2024-07

3. Context-aware Multi-level Question Embedding Fusion for visual question answering;Information Fusion;2024-02

4. Emerging AI Trends in Intelligent and Interactive Multimedia Systems;Artificial Intelligence and Multimedia Data Engineering;2023-12-14

5. A Comprehensive Study of GPT-4V’s Multimodal Capabilities in Medical Imaging;2023-11-04