Attention-Based Monocular Depth Estimation Considering Global and Local Information in Remote Sensing Images-Reference-Cited by-同舟云学术

Attention-Based Monocular Depth Estimation Considering Global and Local Information in Remote Sensing Images

Published:2024-02-04 Issue:3 Volume:16 Page:585
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Lv Junwei¹²³,Zhang Yueting¹²³^ORCID,Guo Jiayi¹²³,Zhao Xin¹²³^ORCID,Gao Ming¹²³,Lei Bin¹²

Affiliation:

1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

2. Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, Chinese Academy of Sciences, Beijing 100190, China

3. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China

Abstract

Monocular depth estimation using a single remote sensing image has emerged as a focal point in both remote sensing and computer vision research, proving crucial in tasks such as 3D reconstruction and target instance segmentation. Monocular depth estimation does not require multiple views as references, leading to significant improvements in both time and efficiency. Due to the complexity, occlusion, and uneven depth distribution of remote sensing images, there are currently few monocular depth estimation methods for remote sensing images. This paper proposes an approach to remote sensing monocular depth estimation that integrates an attention mechanism while considering global and local feature information. Leveraging a single remote sensing image as input, the method outputs end-to-end depth estimation for the corresponding area. In the encoder, the proposed method employs a dense neural network (DenseNet) feature extraction module with efficient channel attention (ECA), enhancing the capture of local information and details in remote sensing images. In the decoder stage, this paper proposes a dense atrous spatial pyramid pooling (DenseASPP) module with channel and spatial attention modules, effectively mitigating information loss and strengthening the relationship between the target’s position and the background in the image. Additionally, weighted global guidance plane modules are introduced to fuse comprehensive features from different scales and receptive fields, finally predicting monocular depth for remote sensing images. Extensive experiments on the publicly available WHU-OMVS dataset demonstrate that our method yields better depth results in both qualitative and quantitative metrics.

Funder

The National Natural Science Foundation of China

Key Research and Development Program of Aerospace Information Research Institute Chinese Academy of Sciences

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/3/585/pdf

Reference42 articles.

1. Geiger, A., Ziegler, J., and Stiller, C. (2011, January 5–9). Stereoscan: Dense 3d reconstruction in real-time. Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany.

2. Heritage recording and 3D modeling with photogrammetry and 3D scanning;Remondino;Remote Sens.,2011

3. Novel Adaptive Region Spectral–Spatial Features for Land Cover Classification with High Spatial Resolution Remotely Sensed Imagery;Lv;IEEE Trans. Geosci. Remote Sens.,2023

4. Immitzer, M., Vuolo, F., and Atzberger, C. (2016). First experience with Sentinel-2 data for crop and tree species classifications in central Europe. Remote Sens., 8.

5. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery;Hu;Remote Sens.,2015

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Remote Sensing Image Scene Classification Based on an Enhanced Attention Module;2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT);2024-03-15