Improving Monocular Depth Estimation with Learned Perceptual Image Patch Similarity-Based Image Reconstruction and Left–Right Difference Image Constraints
-
Published:2023-09-04
Issue:17
Volume:12
Page:3730
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Park Hyeseung1ORCID, Park Seungchul2
Affiliation:
1. Department of Software Engineering, Hyupsung University, Hwaseong-si 18830, Republic of Korea 2. School of Computer Science and Engineering, Korea University of Technology and Education, Cheonan-si 31253, Republic of Korea
Abstract
This paper introduces a novel approach for self-supervised monocular depth estimation. The model is trained on stereo–image (left–right pair) data and incorporates carefully designed perceptual image quality assessment-based loss functions for image reconstruction and left–right image difference. The fidelity of the reconstructed images, obtained by warping the input images using the predicted disparity maps, significantly influences the accuracy of depth estimation in self-supervised monocular depth networks. The suggested LPIPS (Learned Perceptual Image Patch Similarity)-based evaluation of image reconstruction accurately emulates human perceptual mechanisms to quantify the quality of reconstructed images, serving as an image reconstruction loss. Consequently, it facilitates the gradual convergence of the reconstructed images toward a greater similarity with the target images during the training process. Stereo–image pair often exhibits slight discrepancies in brightness, contrast, color, and camera angle due to factors like lighting conditions and camera calibration inaccuracies. These factors limit the improvement of image reconstruction quality. To address this, the left–right difference image loss is introduced, aimed at aligning the disparities between the actual left–right image pair and the reconstructed left–right image pair. Due to the tendency of distant pixel values to approach zero in the difference images derived from the left and right source images of stereo pairs, this loss progressively steers the distant pixel values of the reconstructed difference images toward a convergence with zero. Hence, the use of this loss has demonstrated its efficacy in mitigating distortions in distant regions while enhancing overall performance. The primary objective of this study is to introduce and validate the effectiveness of LPIPS-based image reconstruction and left–right difference image losses in the context of monocular depth estimation. To this end, the proposed loss functions have been seamlessly integrated into a straightforward single-task stereo–image learning framework, incorporating simple hyperparameters. Notably, our approach achieves superior results compared to other state-of-the-art methods, even those adopting more intricate hybrid data and multi-task learning strategies.
Funder
Education and Research Promotion Program of KOREATECH
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference38 articles.
1. Eigen, D., Puhrsch, C., and Fergus, R. (2014, January 8–13). Depth map prediction from a single image using a multi-scale deep network. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada. 2. Deep Learning for Monocular Depth Estimation;Ming;Neurocomputing,2021 3. Godard, C., Aodha, O.M., and Brostow, G.J. (2017, January 21–26). Unsupervised Monocular Depth Estimation with Left-Right Consistency. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. 4. Masoumian, A., Rashwan, H.A., Cristiano, J., Asif, M.S., and Puig, D. (2022). Monocular Depth Estimation Using Deep Learning: A Review. Sensors, 22. 5. Image quality assessment: From error visibility to structural similarity;Wang;IEEE Trans. Image Proc.,2004
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|