TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images-Reference-Cited by-同舟云学术

TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images

Published:2020-01-27 Issue:3 Volume:12 Page:405
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Abdullah Taghreed,Bazi Yakoub^ORCID,Al Rahhal Mohamad M.^ORCID,Mekhalfi Mohamed L.,Rangarajan Lalitha,Zuair Mansour

Abstract

Exploring the relevance between images and their respective natural language descriptions, due to its paramount importance, is regarded as the next frontier in the general computer vision literature. Thus, recently several works have attempted to map visual attributes onto their corresponding textual tenor with certain success. However, this line of research has not been widespread in the remote sensing community. On this point, our contribution is three-pronged. First, we construct a new dataset for text-image matching tasks, termed TextRS, by collecting images from four well-known different scene datasets, namely AID, Merced, PatternNet, and NWPU datasets. Each image is annotated by five different sentences. All the five sentences were allocated by five people to evidence the diversity. Second, we put forth a novel Deep Bidirectional Triplet Network (DBTN) for text to image matching. Unlike traditional remote sensing image-to-image retrieval, our paradigm seeks to carry out the retrieval by matching text to image representations. To achieve that, we propose to learn a bidirectional triplet network, which is composed of Long Short Term Memory network (LSTM) and pre-trained Convolutional Neural Networks (CNNs) based on (EfficientNet-B2, ResNet-50, Inception-v3, and VGG16). Third, we top the proposed architecture with an average fusion strategy to fuse the features pertaining to the five image sentences, which enables learning of more robust embedding. The performances of the method expressed in terms Recall@K representing the presence of the relevant image among the top K retrieved images to the query text shows promising results as it yields 17.20%, 51.39%, and 73.02% for K = 1, 5, and 10, respectively.

Publisher

MDPI AG

Subject

General Earth and Planetary Sciences

Link

https://www.mdpi.com/2072-4292/12/3/405/pdf

Reference81 articles.

1. Learning a Multi-Branch Neural Network from Multiple Sources for Knowledge Adaptation in Remote Sensing Imagery

2. Remote Sensing Image Retrieval With Global Morphological Texture Descriptors

3. A new deep convolutional neural network for fast hyperspectral image classification

4. Interactive learning and probabilistic retrieval in remote sensing image archives

5. A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification

Cited by 63 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Knowledge-Aware Visual Question Generation for Remote Sensing Images;IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium;2024-07-07

2. Understanding remote sensing imagery like reading a text document: What can remote sensing image captioning offer?;International Journal of Applied Earth Observation and Geoinformation;2024-07

3. An Enhanced Feature Extraction Framework for Cross-Modal Image–Text Retrieval;Remote Sensing;2024-06-17

4. Vision-Language Models in Remote Sensing: Current progress and future trends;IEEE Geoscience and Remote Sensing Magazine;2024-06

5. A Multi-modal Interaction Approach to Enhance Natural Language Descriptions of Remote Sensing Images;2024 International Conference on Machine Intelligence and Digital Applications;2024-05-30