Urban Visual Localization of Block-Wise Monocular Images with Google Street Views
-
Published:2024-02-25
Issue:5
Volume:16
Page:801
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Li Zhixin1ORCID, Li Shuang1, Anderson John2, Shan Jie1ORCID
Affiliation:
1. School of Civil Engineering, Purdue University, West Lafayette, IN 47907, USA 2. Geospatial Research Lab, Corbin Field Station, Woodford, VA 22580, USA
Abstract
Urban visual localization is the process of determining the pose (position and attitude) of the imaging sensor (or platform) with the help of existing geo-referenced data. This task is critical and challenging for many applications, such as autonomous navigation, virtual and augmented reality, and robotics, due to the dynamic and complex nature of urban environments that may obstruct Global Navigation Satellite Systems (GNSS) signals. This paper proposes a block-wise matching strategy for urban visual localization by using geo-referenced Google Street View (GSV) panoramas as the database. To determine the pose of the monocular query images collected from a moving vehicle, neighboring GSVs should be found to establish the correspondence through image-wise and block-wise matching. First, each query image is semantically segmented and a template containing all permanent objects is generated. The template is then utilized in conjunction with a template matching approach to identify the corresponding patch from each GSV image within the database. Through the conversion of the query template and corresponding GSV patch into feature vectors, their image-wise similarity is computed pairwise. To ensure reliable matching, the query images are temporally grouped into query blocks, while the GSV images are spatially organized into GSV blocks. By using the previously computed image-wise similarities, we calculate a block-wise similarity for each query block with respect to every GSV block. A query block and its corresponding GSV blocks of top-ranked similarities are then input into a photogrammetric triangulation or structure from motion process to determine the pose of every image in the query block. A total of three datasets, consisting of two public ones and one newly collected on the Purdue campus, are utilized to demonstrate the performance of the proposed method. It is shown it can achieve a meter-level positioning accuracy and is robust to changes in acquisition conditions, such as image resolution, scene complexity, and the time of day.
Reference63 articles.
1. Technologies and Solutions for Location-Based Services in Smart Cities: Past, Present, and Future;Usman;IEEE Access,2018 2. Burgard, W., Brock, O., and Stachniss, C. (2008). Robotics: Science and Systems III, MIT Press. 3. Xiao, Z., Yang, D., Wen, T., Jiang, K., and Yan, R. (2020). Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs. Sensors, 20. 4. Agarwal, P., Burgard, W., and Spinello, L. (October, January 28). Metric Localization Using Google Street View. Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany. 5. Pauls, J.-H., Petek, K., Poggenhans, F., and Stiller, C. (2020–24, January 24). Monocular Localization in HD Maps by Combining Semantic Segmentation and Distance Transform. Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.
|
|