LAM: Remote Sensing Image Captioning with Label-Attention Mechanism-Reference-Cited by-同舟云学术

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

Published:2019-10-10 Issue:20 Volume:11 Page:2349
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Zhang Zhengyuan^ORCID,Diao Wenhui,Zhang Wenkai,Yan Menglong,Gao Xin,Sun Xian

Abstract

Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories’ word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model’s performance for describing high-resolution remote sensing images and obtain better S m scores compared with other methods. S m score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/11/20/2349/pdf

Reference36 articles.

1. A Fast Target Detection Algorithm for High Resolution SAR Imagery;Zhang;J. Remote Sens.,2005

2. Cloud and cloud shadow detection using multilevel feature fused segmentation network

3. An End-to-End Neural Network for Road Extraction From Remote Sensing Imagery by Multiple Feature Pyramid Network

4. Exploring Models and Data for Remote Sensing Image Caption Generation

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring region features in remote sensing image captioning;International Journal of Applied Earth Observation and Geoinformation;2024-03

2. Machine-to-Machine Visual Dialoguing with ChatGPT for Enriched Textual Image Description;Remote Sensing;2024-01-23

3. Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning;Remote Sensing;2024-01-03

4. Learning consensus-aware semantic knowledge for remote sensing image captioning;Pattern Recognition;2024-01

5. Language Integration in Remote Sensing: Tasks, datasets, and future directions;IEEE Geoscience and Remote Sensing Magazine;2023-12