BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING-Reference-Cited by-同舟云学术

BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING

Published:2020-08-03 Issue: Volume:V-2-2020 Page:1021-1027
ISSN:2194-9050
Container-title:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
language:en
Short-container-title:ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

Author:

Lobry S.,Marcos D.,Kellenberger B.,Tuia D.

Abstract

Abstract. Visual Question Answering for Remote Sensing (RSVQA) aims at extracting information from remote sensing images through queries formulated in natural language. Since the answer to the query is also provided in natural language, the system is accessible to non-experts, and therefore dramatically increases the value of remote sensing images as a source of information, for example for journalism purposes or interactive land planning. Ideally, an RSVQA system should be able to provide an answer to questions that vary both in terms of topic (presence, localization, counting) and image content. However, aiming at such flexibility generates problems related to the variability of the possible answers. A striking example is counting, where the number of objects present in a remote sensing image can vary by multiple orders of magnitude, depending on both the scene and type of objects. This represents a challenge for traditional Visual Question Answering (VQA) methods, which either become intractable or result in an accuracy loss, as the number of possible answers has to be limited. To this end, we introduce a new model that jointly solves a classification problem (which is the most common approach in VQA) and a regression problem (to answer numerical questions more precisely). An evaluation of this method on the RSVQA dataset shows that this finer numerical output comes at the cost of a small loss of performance on non-numerical questions.

Publisher

Copernicus GmbH

Link

https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/V-2-2020/1021/2020/isprs-annals-V-2-2020-1021-2020.pdf

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Merging Patches and Tokens: A VQA System for Remote Sensing;IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium;2024-07-07

2. Language Integration in Remote Sensing: Tasks, datasets, and future directions;IEEE Geoscience and Remote Sensing Magazine;2023-12

3. Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning;Artificial Intelligence Review;2023-01-17

4. Remote sensing visual question answering with a self-attention multi-modal encoder;Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery;2022-11

5. Embedding Spatial Relations in Visual Question Answering for Remote Sensing;2022 26th International Conference on Pattern Recognition (ICPR);2022-08-21