A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing-Reference-Cited by-同舟云学术

A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing

Published:2022-11-29 Issue:23 Volume:12 Page:12221
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zheng Fuzhong,Li Weipeng^ORCID,Wang Xu,Wang Luyao,Zhang Xiong,Zhang Haisu

Abstract

With the rapid development of remote sensing (RS) observation technology over recent years, the high-level semantic association-based cross-modal retrieval of RS images has drawn some attention. However, few existing studies on cross-modal retrieval of RS images have addressed the issue of mutual interference between semantic features of images caused by “multi-scene semantics”. Therefore, we proposed a novel cross-attention (CA) model, called CABIR, based on regional-level semantic features of RS images for cross-modal text-image retrieval. This technique utilizes the CA mechanism to implement cross-modal information interaction and guides the network with textual semantics to allocate weights and filter redundant features for image regions, reducing the effect of irrelevant scene semantics on retrieval. Furthermore, we proposed BERT plus Bi-GRU, a new approach to generating statement-level textual features, and designed an effective temperature control function to steer the CA network toward smooth running. Our experiment suggested that CABIR not only outperforms other state-of-the-art cross-modal image retrieval methods but also demonstrates high generalization ability and stability, with an average recall rate of up to 18.12%, 48.30%, and 55.53% over the datasets RSICD, UCM, and Sydney, respectively. The model proposed in this paper will be able to provide a possible solution to the problem of mutual interference of RS images with “multi-scene semantics” due to complex terrain objects.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/23/12221/pdf

Reference40 articles.

1. Robust Feature Matching for Remote Sensing Image Registration via Locally Linear Transforming;Ma;IEEE Trans. Geosci. Remote Sens.,2015

2. Entropy-Balanced Bitmap Tree for Shape-Based Object Retrieval From Large-Scale Satellite Imagery Databases;Scott;IEEE Trans. Geosci. Remote Sens.,2011

3. Hashing-Based Scalable Remote Sensing Image Search and Retrieval in Large Archives;Demir;IEEE Trans. Geosci. Remote Sens.,2016

4. Big Data for Remote Sensing: Challenges and Opportunities;Chi;Proc. IEEE,2016

5. Partial Randomness Hashing for Large-Scale Remote Sensing Image Retrieval;Li;IEEE Geosci. Remote Sens. Lett.,2017

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Road extraction through Yangwang-1 nighttime light data: A case study in Wenzhou, China;PLOS ONE;2024-01-19

2. Language Integration in Remote Sensing: Tasks, datasets, and future directions;IEEE Geoscience and Remote Sensing Magazine;2023-12

3. A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing;Remote Sensing;2023-09-21

4. Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval;IEEE Transactions on Geoscience and Remote Sensing;2023

5. Contrasting Dual Transformer Architectures for Multi-Modal Remote Sensing Image Retrieval;Applied Sciences;2022-12-26