Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation
-
Published:2024-06-11
Issue:6
Volume:13
Page:193
-
ISSN:2220-9964
-
Container-title:ISPRS International Journal of Geo-Information
-
language:en
-
Short-container-title:IJGI
Author:
Zhao Hailiang1ORCID, Kong Yongyi1, Zhang Chonghao1ORCID, Zhang Haoji1, Zhao Jiansen1ORCID
Affiliation:
1. Merchant Marine College, Shanghai Maritime University, Shanghai 200135, China
Abstract
Recent studies on self-supervised monocular depth estimation have achieved promising results, which are mainly based on the joint optimization of depth and pose estimation via high-level photometric loss. However, how to learn the latent and beneficial task-specific geometry representation from videos is still far from being explored. To tackle this issue, we propose two novel schemes to learn more effective representation from monocular videos: (i) an Inter-task Attention Model (IAM) to learn the geometric correlation representation between the depth and pose learning networks to make structure and motion information mutually beneficial; (ii) a Spatial-Temporal Memory Module (STMM) to exploit long-range geometric context representation among consecutive frames both spatially and temporally. Systematic ablation studies are conducted to demonstrate the effectiveness of each component. Evaluations on KITTI show that our method outperforms current state-of-the-art techniques.
Funder
National Key Research and Development Program, China National Natural Science Foundation of China
Reference51 articles.
1. Zhou, T., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21–26). Unsupervised learning of depth and ego-motion from video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 2. Godard, C., MacAodha, O., Firman, M., and Brostow, G.J. (November, January 27). Digging into self-supervised monocular depth estimation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea. 3. Zhao, H., Zhang, Q., Zhao, S., Chen, Z., Zhang, J., and Tao, D. (2024, January 18–22). Simdistill: Simulated multi-modal distillation for bev 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada. 4. Evaluation of ship emission intensity and the inaccuracy of exhaust emission estimation model;Shu;Ocean. Eng.,2023 5. Mahjourian, R., Wicke, M., and Angelova, A. (2018, January 18–22). Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
|
|