IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data-Reference-Cited by-同舟云学术

IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data

Published:2024-03-31 Issue:7 Volume:16 Page:1249
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Yan Yiming¹²³^ORCID,Wang Mengyuan¹²,Su Nan¹²^ORCID,Hou Wei³,Zhao Chunhui¹²,Wang Wenxuan¹²

Affiliation:

1. College of Information and Communication Engineering, Harbin Engineering University, Harbin 150009, China

2. Key Laboratory of Advanced Marine Communication and Information Technology, Ministry of Industry and Information, Harbin 150009, China

3. Harbin Aerospace Star Data System Science and Technology Co., Ltd., Harbin 150028, China

Abstract

Cross-view geolocation is a valuable yet challenging task. In practical applications, the images targeted by cross-view geolocation technology encompass multi-domain remote sensing images, including those from different platforms (e.g., drone cameras and satellites), different perspectives (e.g., nadir and oblique), and different temporal conditions (e.g., various seasons and weather conditions). Based on the characteristics of these images, we have designed an effective framework, Image Reconstruction and Multi-Unit Mutual Learning Net (IML-Net), for accomplishing cross-view geolocation tasks. By incorporating a deconvolutional network into the architecture to reconstruct images, we can better bridge the differences in remote sensing image features across different domains. This enables the mapping of target images from different platforms and perspectives into a shared latent space representation, obtaining more discriminative feature descriptors. The process enhances the robustness of feature extraction for locating targets across a wide range of perspectives. To improve the network’s performance, we introduce attention regions learned from different units as augmented data during the training process. For the current cross-view geolocation datasets, the use of large-scale datasets is limited due to high costs and privacy concerns, leading to the prevalent use of simulated data. However, real data allow the network to learn more generalizable features. To make the model more robust and stable, we collected two groups of multi-domain datasets from the Zurich and Harbin regions, incorporating real data into the cross-view geolocation task to construct the ZHcity750 Dataset. Our framework is evaluated on the cross-domain ZHcity750 Dataset, which shows competitive results compared to state-of-the-art methods.

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/7/1249/pdf

Reference57 articles.

1. Shi, Y., Liu, L., Yu, X., and Li, H. (2019, January 8–14). Spatial-Aware Feature Aggregation for Cross-View Image Based Geo-Localization. Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada.

2. Liu, L., and Li, H. (2019, January 16–20). Lending Orientation to Neural Networks for Cross-View Geo-Localization. Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.

3. Shi, Y., Yu, X., Liu, L., Zhang, T., and Li, H. (2020, January 7–12). Optimal Feature Transport for Cross-View Image Geo-Localization. Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA.

4. Shi, Y., Yu, X., Campbell, D., and Li, H. (2020, January 16–19). Where Am I Looking at? Joint Location and Orientation Estimation by Cross-View Matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA.

5. Hu, S., Feng, M., Nguyen, R.M.H., and Lee, G.H. (2018, January 18–22). CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization. Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.