Unsupervised Monocular Depth and Camera Pose Estimation with Multiple Masks and Geometric Consistency Constraints
Author:
Zhang Xudong1, Zhao Baigan2ORCID, Yao Jiannan2, Wu Guoqing1
Affiliation:
1. School of Information Science and Technology, Nantong University, Nantong 226019, China 2. School of Mechanical Engineering, Nantong University, Nantong 226019, China
Abstract
This paper presents a novel unsupervised learning framework for estimating scene depth and camera pose from video sequences, fundamental to many high-level tasks such as 3D reconstruction, visual navigation, and augmented reality. Although existing unsupervised methods have achieved promising results, their performance suffers in challenging scenes such as those with dynamic objects and occluded regions. As a result, multiple mask technologies and geometric consistency constraints are adopted in this research to mitigate their negative impacts. Firstly, multiple mask technologies are used to identify numerous outliers in the scene, which are excluded from the loss computation. In addition, the identified outliers are employed as a supervised signal to train a mask estimation network. The estimated mask is then utilized to preprocess the input to the pose estimation network, mitigating the potential adverse effects of challenging scenes on pose estimation. Furthermore, we propose geometric consistency constraints to reduce the sensitivity of illumination changes, which act as additional supervised signals to train the network. Experimental results on the KITTI dataset demonstrate that our proposed strategies can effectively enhance the model’s performance, outperforming other unsupervised methods.
Funder
National Natural Science Foundation of China Qing Lan Project of Jiangsu Province, the Priority Academic Program Development of Jiangsu Higher Education Institutions Industry-university-research Cooperation Project in Jiangsu Province
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference45 articles.
1. Wei, H., Huang, Y., Hu, F., Zhao, B., Guo, Z., and Zhang, R. (2021). Motion Estimation Using Region-Level Segmentation and Extended Kalman Filter for Autonomous Driving. Remote Sens., 13. 2. Rosique, F., Navarro, P.J., Miller, L., and Salas, E. (2023). Autonomous Vehicle Dataset with Real Multi-Driver Scenes and Biometric Data. Sensors, 23. 3. Luo, G., Xiong, G., Huang, X., Zhao, X., Tong, Y., Chen, Q., Zhu, Z., Lei, H., and Lin, J. (2023). Geometry Sampling-Based Adaption to DCGAN for 3D Face Generation. Sensors, 23. 4. Zou, Y., Eldemiry, A., Li, Y., and Chen, W. (2020). Robust RGB-D SLAM Using Point and Line Features for Low Textured Scene. Sensors, 20. 5. Romero-Ramirez, F.J., Muñoz-Salinas, R., Marín-Jiménez, M.J., Cazorla, M., and Medina-Carnicer, R. (2023). sSLAM: Speeded-Up Visual SLAM Mixing Artificial Markers and Temporary Keypoints. Sensors, 23.
|
|