Overcoming Language Priors in VQA via Decomposed Linguistic Representations-Reference-Cited by-同舟云学术

Overcoming Language Priors in VQA via Decomposed Linguistic Representations

Published:2020-04-03 Issue:07 Volume:34 Page:11181-11188
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Jing Chenchen,Wu Yuwei,Zhang Xiaoxun,Jia Yunde,Wu Qi

Abstract

Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 34 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Overcoming language priors in visual question answering with cumulative learning strategy;Neurocomputing;2024-12

2. Robust Visual Question Answering: Datasets, Methods, and Future Challenges;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-08

3. Enhancing multi-modal fusion in visual dialog via sample debiasing and feature interaction;Information Fusion;2024-07

4. Reducing Language Bias for Robust VQA Model with Multi-Branch Learning;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

5. Modeling Multimodal Uncertainties via Probability Distribution Encoders Included Vision-Language Models;IEEE Access;2024