Occlusion Boundary Prediction and Transformer Based Depth-Map Refinement From Single Image-Reference-Cited by-同舟云学术

Occlusion Boundary Prediction and Transformer Based Depth-Map Refinement From Single Image

Published:2024-01-10 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Hambarde Praful¹,Wadhwa Gourav²,Vipparthi Santosh Kumar¹,Murala Subrahmanyam³,Dhall Abhinav⁴

Affiliation:

1. Computer Vision and Pattern Recognition Lab, Indian Institute of Technology Ropar, India

2. ByteDance, Singapore

3. School of Computer Science and Statistics, Trinity College Dublin, Ireland

4. Flinders University Adelaide, Australia

Abstract

Due to the numerous applications of boundary maps and occlusion orientation maps (ORI-maps) in high-level vision problems, accurate estimation of these maps is a crucial task. The existing deep networks employ a single-stream network to estimate the relation between boundary map and ORI-map estimation. However, these networks fail to explore significant individual information separately. To resolve this problem, in this paper, we propose a novel two-stream generative adversarial network (GAN) for boundary map and ORI-map estimation, named OBP-GAN. The proposed OBP-GAN consists of two streams known as BP-GAN and OR-GAN. The BP-GAN estimates the boundary map, and the OR-GAN predicts the ORI-map. The boundary and ORI-map can also be useful cues for the task of depth-map refinement from single images. Therefore, in this work, we propose a transformer-based depth-map refinement network (TRANSDMR-GAN) for refining the depth estimated from monocular images using boundary and ORI-map. We conducted extensive analyses on indoor and outdoor datasets to validate our proposed OBP-GAN and TRANSDMR-GAN. The extensive experimental analysis and ablation study demonstrate the ability of the proposed OBP-GAN to generate state-of-the-art occlusion boundary maps. Furthermore, we show that the proposed network, TRANSDMR-GAN, can generate an edge-enhanced depth map without degrading the accuracy of the initial depth map.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3640015

Reference69 articles.

1. Sahirzeeshan Ali and Anant Madabhushi . 2012. An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery . IEEE transactions on medical imaging 31, 7 ( 2012 ), 1448–1460. Sahirzeeshan Ali and Anant Madabhushi. 2012. An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery. IEEE transactions on medical imaging 31, 7 (2012), 1448–1460.

2. Emanuel Ben-Baruch Tal Ridnik Nadav Zamir Asaf Noy Itamar Friedman Matan Protter and Lihi Zelnik-Manor. 2021. Asymmetric Loss For Multi-Label Classification. arXiv preprint arXiv:2009.14119(2021). Emanuel Ben-Baruch Tal Ridnik Nadav Zamir Asaf Noy Itamar Friedman Matan Protter and Lihi Zelnik-Manor. 2021. Asymmetric Loss For Multi-Label Classification. arXiv preprint arXiv:2009.14119(2021).

3. Sandesh Bhagat , Manesh Kokare , Vineet Haswani , Praful Hambarde , and Ravi Kamble . 2021 . WheatNet-lite: a novel light weight network for wheat head detection . In Proceedings of the IEEE/CVF international conference on computer vision. 1332–1341 . Sandesh Bhagat, Manesh Kokare, Vineet Haswani, Praful Hambarde, and Ravi Kamble. 2021. WheatNet-lite: a novel light weight network for wheat head detection. In Proceedings of the IEEE/CVF international conference on computer vision. 1332–1341.

4. Yifeng Chen , Guangchen Lin , Songyuan Li , Omar Bourahla , Yiming Wu , Fangfang Wang , Junyi Feng , Mingliang Xu , and Xi Li . 2020 . BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3793–3802 . Yifeng Chen, Guangchen Lin, Songyuan Li, Omar Bourahla, Yiming Wu, Fangfang Wang, Junyi Feng, Mingliang Xu, and Xi Li. 2020. BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3793–3802.

5. Ying-Cong Chen , Xiaogang Xu , and Jiaya Jia . 2020 . Domain adaptive image-to-image translation . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5274–5283 . Ying-Cong Chen, Xiaogang Xu, and Jiaya Jia. 2020. Domain adaptive image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5274–5283.