Fully Cross-Attention Transformer for Guided Depth Super-Resolution
Author:
Ariav Ido1ORCID, Cohen Israel1ORCID
Affiliation:
1. Andrew and Erna Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Haifa 3200003, Israel
Abstract
Modern depth sensors are often characterized by low spatial resolution, which hinders their use in real-world applications. However, the depth map in many scenarios is accompanied by a corresponding high-resolution color image. In light of this, learning-based methods have been extensively used for guided super-resolution of depth maps. A guided super-resolution scheme uses a corresponding high-resolution color image to infer high-resolution depth maps from low-resolution ones. Unfortunately, these methods still have texture copying problems due to improper guidance from color images. Specifically, in most existing methods, guidance from the color image is achieved by a naive concatenation of color and depth features. In this paper, we propose a fully transformer-based network for depth map super-resolution. A cascaded transformer module extracts deep features from a low-resolution depth. It incorporates a novel cross-attention mechanism to seamlessly and continuously guide the color image into the depth upsampling process. Using a window partitioning scheme, linear complexity in image resolution can be achieved, so it can be applied to high-resolution images. The proposed method of guided depth super-resolution outperforms other state-of-the-art methods through extensive experiments.
Funder
PMRI—Peter Munk Research Institute-Technion
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference63 articles.
1. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., and Davison, A. (2011, January 16–19). KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA. 2. Schamm, T., Strand, M., Gumpp, T., Kohlhaas, R., Zollner, J.M., and Dillmann, R. (2009, January 22–26). Vision and ToF-based driving assistance for a personal transporter. Proceedings of the 2009 International Conference on Advanced Robotics, Munich, Germany. 3. Hierarchical features driven residual learning for depth map super-resolution;Guo;IEEE Trans. Image Process.,2018 4. Hui, T.W., Loy, C.C., and Tang, X. (2016, January 11–14). Depth map super-resolution by deep multi-scale guidance. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands. 5. Riegler, G., Rüther, M., and Bischof, H. (2016, January 11–14). Atgv-net: Accurate depth super-resolution. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|