Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation-Reference-Cited by-同舟云学术

Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation

Published:2024-06-11 Issue:6 Volume:13 Page:193
ISSN:2220-9964
Container-title:ISPRS International Journal of Geo-Information
language:en
Short-container-title:IJGI

Author:

Zhao Hailiang¹^ORCID,Kong Yongyi¹,Zhang Chonghao¹^ORCID,Zhang Haoji¹,Zhao Jiansen¹^ORCID

Affiliation:

1. Merchant Marine College, Shanghai Maritime University, Shanghai 200135, China

Abstract

Recent studies on self-supervised monocular depth estimation have achieved promising results, which are mainly based on the joint optimization of depth and pose estimation via high-level photometric loss. However, how to learn the latent and beneficial task-specific geometry representation from videos is still far from being explored. To tackle this issue, we propose two novel schemes to learn more effective representation from monocular videos: (i) an Inter-task Attention Model (IAM) to learn the geometric correlation representation between the depth and pose learning networks to make structure and motion information mutually beneficial; (ii) a Spatial-Temporal Memory Module (STMM) to exploit long-range geometric context representation among consecutive frames both spatially and temporally. Systematic ablation studies are conducted to demonstrate the effectiveness of each component. Evaluations on KITTI show that our method outperforms current state-of-the-art techniques.

Funder

National Key Research and Development Program, China

National Natural Science Foundation of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2220-9964/13/6/193/pdf

Reference51 articles.

1. Zhou, T., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21–26). Unsupervised learning of depth and ego-motion from video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.

2. Godard, C., MacAodha, O., Firman, M., and Brostow, G.J. (November, January 27). Digging into self-supervised monocular depth estimation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea.

3. Zhao, H., Zhang, Q., Zhao, S., Chen, Z., Zhang, J., and Tao, D. (2024, January 18–22). Simdistill: Simulated multi-modal distillation for bev 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.

4. Evaluation of ship emission intensity and the inaccuracy of exhaust emission estimation model;Shu;Ocean. Eng.,2023

5. Mahjourian, R., Wicke, M., and Angelova, A. (2018, January 18–22). Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.