A Comparative Review on Enhancing Visual Simultaneous Localization and Mapping with Deep Semantic Segmentation
Author:
Liu Xiwen12, He Yong2, Li Jue3ORCID, Yan Rui2, Li Xiaoyu2, Huang Hui4
Affiliation:
1. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natura Resources, Shenzhen 518034, China 2. School of Smart City, Chongqing Jiaotong University, Chongqing 400074, China 3. College of Traffic & Transportation, Chongqing Jiaotong University, Chongqing 400074, China 4. Chongqing Digital City Technology Co., Ltd., Chongqing 400074, China
Abstract
Visual simultaneous localization and mapping (VSLAM) enhances the navigation of autonomous agents in unfamiliar environments by progressively constructing maps and estimating poses. However, conventional VSLAM pipelines often exhibited degraded performance in dynamic environments featuring mobile objects. Recent research in deep learning led to notable progress in semantic segmentation, which involves assigning semantic labels to image pixels. The integration of semantic segmentation into VSLAM can effectively differentiate between static and dynamic elements in intricate scenes. This paper provided a comprehensive comparative review on leveraging semantic segmentation to improve major components of VSLAM, including visual odometry, loop closure detection, and environmental mapping. Key principles and methods for both traditional VSLAM and deep semantic segmentation were introduced. This paper presented an overview and comparative analysis of the technical implementations of semantic integration across various modules of the VSLAM pipeline. Furthermore, it examined the features and potential use cases associated with the fusion of VSLAM and semantics. It was found that the existing VSLAM model continued to face challenges related to computational complexity. Promising future research directions were identified, including efficient model design, multimodal fusion, online adaptation, dynamic scene reconstruction, and end-to-end joint optimization. This review shed light on the emerging paradigm of semantic VSLAM and how deep learning-enabled semantic reasoning could unlock new capabilities for autonomous intelligent systems to operate reliably in the real world.
Funder
Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources
Reference109 articles.
1. A review of semantic segmentation using deep neural networks;Guo;Int. J. Multimed. Inf. Retr.,2018 2. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., and Cottrell, G. (2018, January 12–15). Understanding convolution for semantic segmentation. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA. 3. Chen, W., Shang, G., Ji, A., Zhou, C., Wang, X., Xu, C., Li, Z., and Hu, K. (2022). An overview on visual slam: From tradition to semantic. Remote. Sens., 14. 4. Wang, Y., Zhang, Y., Hu, L., Wang, W., Ge, G., and Tan, S. (2023). A Semantic Topology Graph to Detect Re-Localization and Loop Closure of the Visual Simultaneous Localization and Mapping System in a Dynamic Environment. Sensors, 23. 5. Fast direct stereo visual SLAM;Mo;IEEE Robot. Autom. Lett.,2021
|
|