Affiliation:
1. Department of Information Engineering and Computer Science, University of Trento, 38123 Trento, Italy
2. Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 4545, Saudi Arabia
Abstract
Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise required to process, analyze, and exploit remote sensing images, while on the other, it provides a direct and general form of communication. However, image captioning is usually restricted to a single sentence, which barely describes the rich semantic information that typically characterizes remote sensing (RS) images. In this paper, we aim to move one step forward by proposing a captioning system that, mimicking human behavior, adopts dialogue as a tool to explore and dig for information, leading to more detailed and comprehensive descriptions of RS scenes. The system relies on a questions–answers scheme fed by a query image and summarizes the dialogue content with ChatGPT. Experiments carried out on two benchmark remote sensing datasets confirm the potential of such an approach in the context of semantic information mining. Strengths and weaknesses are highlighted and discussed, as well as some possible future developments.
Subject
General Earth and Planetary Sciences
Reference37 articles.
1. Wu, L., Tan, X., He, D., Tian, F., Qin, T., Lai, J., and Liu, T.Y. (2018). Beyond Error Propagation in Neural Machine Translation: Characteristics of Language. arXiv.
2. Change Captioning: A New Paradigm for Multitemporal Remote Sensing Image Analysis;Hoxha;IEEE Trans. Geosci. Remote Sens.,2022
3. Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image?;Shi;IEEE Trans. Geosci. Remote Sens.,2017
4. Qu, B., Li, X., Tao, D., and Lu, X. (2016, January 6–8). Deep semantic understanding of high resolution remote sensing image. Proceedings of the 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), Kunming, China.
5. He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.