Lp-slam: language-perceptive RGB-D SLAM framework exploiting large language model-Reference-Cited by-同舟云学术

Lp-slam: language-perceptive RGB-D SLAM framework exploiting large language model

Published:2024-04-30 Issue:4 Volume:10 Page:5391-5409
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Zhang Weiyi^ORCID,Guo Yushi,Niu Liting,Li Peijun,Wan Zeyu,Shao Fei,Nian Cheng,Farrukh Fasih Ud Din,Zhang Debing,Zhang Chun,Li Qiang,Zhang Jianwei

Abstract

AbstractWith the development of deep learning, a higher level of perception of the environment such as the semantic level can be achieved in the simultaneous localization and mapping (SLAM) domain. However, previous works did not achieve a natural-language level of perception. Therefore, LP-SLAM (Language-Perceptive RGB-D SLAM) is proposed that leverages large language models (LLMs). The texts in the scene can be detected by scene text recognition (STR) and mapped as landmarks with a task-driven selection. A text error correction chain (TECC) is designed with a similarity classification method, a two-stage memory strategy, and a text clustering method. The proposed architecture is designed to deal with the mis-detection and mis-recognition cases of STR and to provide accurate text information to the framework. The proposed framework takes input images and generates a 3D map with sparse point cloud and task-related texts. Finally, a natural user interface (NUI) is designed based on the constructed map and LLM, which gives position instructions based on users’ natural queries. The experimental results validated the proposed TECC design and the overall framework. We publish the virtual dataset with ground truth, as well as the source code for further research. https://github.com/GroupOfLPSLAM/LP_SLAM.

Funder

National Natural Science Foundation of China

HORIZON EUROPE Marie Sklodowska-Curie Actions

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s40747-024-01408-0.pdf

Reference52 articles.

1. Chen K, Lopez BT, Agha-mohammadi A-A, Mehta A (2022) Direct lidar odometry: Fast localization with dense point clouds. IEEE Robot Autom Lett 7(2):2000–2007

2. Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262

3. Labbe M, Michaud F (2013) Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Trans Robot 29(3):734–745

4. Endres F, Hess J, Sturm J, Cremers D, Burgard W (2013) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187

5. Harris C, Stephens M et al (1988) A combined corner and edge detector. In: Alvey Vision Conference, vol 15, pp 10–5244