Multi-View Visual Question Answering with Active Viewpoint Selection-Reference-Cited by-同舟云学术

Multi-View Visual Question Answering with Active Viewpoint Selection

Published:2020-04-17 Issue:8 Volume:20 Page:2281
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Qiu Yue^ORCID,Satoh Yutaka^ORCID,Suzuki Ryota^ORCID,Iwata Kenji,Kataoka Hirokatsu^ORCID

Abstract

This paper proposes a framework that allows the observation of a scene iteratively to answer a given question about the scene. Conventional visual question answering (VQA) methods are designed to answer given questions based on single-view images. However, in real-world applications, such as human–robot interaction (HRI), in which camera angles and occluded scenes must be considered, answering questions based on single-view images might be difficult. Since HRI applications make it possible to observe a scene from multiple viewpoints, it is reasonable to discuss the VQA task in multi-view settings. In addition, because it is usually challenging to observe a scene from arbitrary viewpoints, we designed a framework that allows the observation of a scene actively until the necessary scene information to answer a given question is obtained. The proposed framework achieves comparable performance to a state-of-the-art method in question answering and simultaneously decreases the number of required observation viewpoints by a significant margin. Additionally, we found our framework plausibly learned to choose better viewpoints for answering questions, lowering the required number of camera movements. Moreover, we built a multi-view VQA dataset based on real images. The proposed framework shows high accuracy (94.01%) for the unseen real image dataset.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/20/8/2281/pdf

Reference39 articles.

1. Multimodal compact bilinear pooling for visual question answering and visual grounding;Fukui;arXiv,2016

2. Compositional attention networks for machine reasoning;Hudson;arXiv,2018

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Visual question answering from another perspective: CLEVR mental rotation tests;Pattern Recognition;2023-04

2. Vision-based holistic scene understanding towards proactive human–robot collaboration;Robotics and Computer-Integrated Manufacturing;2022-06

3. Research on an Optimal Path Planning Method Based on A* Algorithm for Multi-View Recognition;Algorithms;2022-05-20

4. Multi-View Visual Relationship Detection with Estimated Depth Map;Applied Sciences;2022-05-06

5. 3D Question Answering;IEEE Transactions on Visualization and Computer Graphics;2022