A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision-Reference-Cited by-同舟云学术

A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision

Published:2023-11-14 Issue:22 Volume:11 Page:4645
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Tan Jiahai¹²,Gao Ming¹,Duan Tao²,Gao Xiaomei³

Affiliation:

1. School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, China

2. State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China

3. Xi’an Mapping and Printing of China National Administration of Coal Geology, Xi’an 710199, China

Abstract

Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks.

Funder

Open Research Fund of State Key Laboratory of Transient Optics and Photonics, Chinese Academy of Sciences

Key R&D project of Shaanxi Province

Key Scientific Research Program of Shaanxi Provincial Department of Education

Xian Science and Technology Research Plan

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/22/4645/pdf

Reference50 articles.

1. Siddiqui, Y., Porzi, L., Bulò, S., Muller, N., Nießner, M., Dai, A., and Kontschieder, P. (2023, January 18–22). Panoptic lifting for 3d scene understanding with neural fields. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.

2. ArthroNet: A monocular depth estimation technique with 3D segmented maps for knee arthroscopy;Ali;Intell. Med.,2023

3. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications;Yang;Pattern Recognit. Lett.,2022

4. Zhou, C., Yan, Q., Shi, Y., and Sun, L. (2022, January 10–12). DoubleStar: Long-Range Attack Towards Depth Estimation based Obstacle Avoidance in Autonomous Systems. Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA.

5. (2020, April 21). Tesla Use pEr-Pixel Depth Estimation with Self-Supervised Learning. Available online: https://youtu.be/hx7BXih7zx8?t=1334.