Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation-Reference-Cited by-同舟云学术

Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation

Published:2024-06-13 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Zhang Jiehua¹^ORCID,Li Liang²^ORCID,Yan Chenggang³^ORCID,Wang Zhan⁴^ORCID,Xu Changliang⁵^ORCID,Zhang Jiyong³^ORCID,Chen Chuqiao⁶^ORCID

Affiliation:

1. Xi’an Jiaotong University, China

2. Institute of Computing Technology, Chinese Academy of Sciences, China

3. Hangzhou Dianzi University, China

4. Moreal Pte. Ltd., Singapore

5. State Key Laboratory of Media Convergence Production Technology and Systems , China and Xinhua Zhiyun Technology Co., Ltd., China

6. Lishui Institute of Hangzhou Dianzi University, China

Abstract

Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3672397

Reference70 articles.

1. David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems, pages 2366–2374, 2014.

2. Jiehua Zhang, Liang Li, Chenggang Yan, Yaoqi Sun, Tao Shen, Jiyong Zhang, and Zhan Wang. Heuristic depth estimation with progressive depth reconstruction and confidence-aware loss. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2252–2261, 2021.

3. Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.

4. Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, and Ian Reid. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In European conference on computer vision, pages 740–756. Springer, 2016.

5. Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017.