HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer-Reference-Cited by-同舟云学术

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

Published:2024-02-06 Issue:7 Volume:132 Page:2530-2550
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Wang Shuzhe^ORCID,Laskar Zakaria,Melekhov Iaroslav,Li Xiaotian,Zhao Yi,Tolias Giorgos,Kannala Juho

Abstract

AbstractVisual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.

Funder

Academic of Finland

Junior Star GACR

Programme Johannes Amos Comenius

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11263-023-01982-9.pdf

Reference91 articles.

1. Arandjelović, R., Gronat, P., Torii, A., Pajdla, T. & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5297–5307).

2. Balntas, V., Li, S. & Prisacariu, V. (2018). RelocNet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European conference on computer vision (ECCV) (pp. 751–767). Springer International Publishing.

3. Balntas, V., Riba, E., Ponsa, D. & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British machine vision conference (BMVC)

4. Bay, H., Tuytelaars, T. & Van Gool, L. (2006). SURF: Speeded up robust features. In Proceedings of the European conference on computer vision (ECCV) (pp. 404–417). Springer International Publishing.

5. Brachmann, E., Humenberger, M., Rother, C. & Sattler, T. (2021). On the limits of pseudo ground truth in visual camera re-localisation. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 6218–6228).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An end-to-end learning framework for visual camera relocalization using RGB and RGB-D images;Measurement Science and Technology;2024-06-03

2. SACNet: A Scattered Attention-Based Network With Feature Compensator for Visual Localization;IEEE Robotics and Automation Letters;2024-04