Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers
Author:
Choi Youn-Ho,Kee Seok-Cheol
Abstract
It is important to estimate the exact depth from 2D images, and many studies have been conducted for a long period of time to solve depth estimation problems. Recently, as research on estimating depth from monocular camera images based on deep learning is progressing, research for estimating accurate depths using various techniques is being conducted. However, depth estimation from 2D images has been a problem in predicting the boundary between objects. In this paper, we aim to predict sophisticated depths by emphasizing the precise boundaries between objects. We propose a depth estimation network with encoder–decoder structures using the Laplacian pyramid and local planar guidance method. In the process of upsampling the learned features using the encoder, the purpose of this step is to obtain a clearer depth map by guiding a more sophisticated boundary of an object using the Laplacian pyramid and local planar guidance techniques. We train and test our models with KITTI and NYU Depth V2 datasets. The proposed network constructs a DNN using only convolution and uses the ConvNext networks as a backbone. A trained model shows the performance of the absolute relative error (Abs_rel) 0.054 and root mean square error (RMSE) 2.252 based on the KITTI dataset and absolute relative error (Abs_rel) 0.102 and root mean square error 0.355 based on the NYU Depth V2 dataset. On the state-of-the-art monocular depth estimation, our network performance shows the fifth-best performance based on the KITTI Eigen split and the eighth-best performance based on the NYU Depth V2.
Funder
National Research Foundation of Korea (NRF) grant funded by the Korean government MSIT (Ministry of Science and ICT), Korea, under the Grand Information Technology Research Center support program
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference29 articles.
1. Depth map prediction from a single image using a multi-scale deep network;Eigen;Adv. Neural Inf. Process. Syst.,2014 2. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 3. Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21–26). Densely connected convolutional networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA. 4. Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K. (2017, January 21–26). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. 5. Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 19–20). A ConvNet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.
|
|