Author:
Xia Zhongyi,Wu Tianzhao,Wang Zhuoyan,Zhou Man,Wu Boqi,Chan C. Y.,Kong Ling Bing
Abstract
AbstractStereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
Publisher
Springer Science and Business Media LLC
Reference50 articles.
1. Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S. & Aksoy, Y. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9680–9689 (2021).
2. Zhou, H., Greenwood, D., Taylor, S. L. & Gong, H. Constant velocity constraints for self-supervised monocular depth estimation. Proc. of the 17th ACM SIGGRAPH European Conference on Visual Media Production (2020).
3. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2012).
4. Shelhamer, E., Long, J. & Darrell, T. Fully convolutional networks for semantic segmentation. Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440 (2014).
5. Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. Proc. 2015 IEEE International Conference on Computer Vision (ICCV), 1520–1528 (2015).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献