A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval
Author:
Zheng Fuzhong1, Wang Xu1, Wang Luyao1, Zhang Xiong1, Zhu Hongze1, Wang Long1, Zhang Haisu1
Affiliation:
1. College of Information and Communication, National University of Defense Technology, Wuhan 430074, China
Abstract
Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.
Funder
National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference65 articles.
1. Liang, W., Li, J., Diao, W., Sun, X., Fu, K., and Wu, Y. (2020). FGATR-Net: Automatic Network Architecture Design for Fine-Grained Aircraft Type Recognition in Remote Sensing Images. Remote Sens., 12. 2. Making (Remote) Sense of Lianas;Heijden;J. Ecol.,2022 3. MSANet: An Improved Semantic Segmentation Method Using Multi-Scale Attention for Remote Sensing Images;Zhang;Remote Sens. Lett.,2022 4. Rusnák, M., Goga, T., Michaleje, L., Šulc Michalková, M., Máčka, Z., Bertalan, L., and Kidová, A. (2022). Remote Sensing of Riparian Ecosystems. Remote Sens., 14. 5. A Method for Fully Automatic Building Footprint Extraction From Remote Sensing Images;Xiong;Can. J. Remote Sens.,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|