Author:
Kim Yeseung,Kim Dohyun,Choi Jieun,Park Jisang,Oh Nayoung,Park Daehyung
Abstract
AbstractIn recent years, the integration of large language models (LLMs) has revolutionized the field of robotics, enabling robots to communicate, understand, and reason with human-like proficiency. This paper explores the multifaceted impact of LLMs on robotics, addressing key challenges and opportunities for leveraging these models across various domains. By categorizing and analyzing LLM applications within core robotics elements—communication, perception, planning, and control—we aim to provide actionable insights for researchers seeking to integrate LLMs into their robotic systems. Our investigation focuses on LLMs developed post-GPT-3.5, primarily in text-based modalities while also considering multimodal approaches for perception and control. We offer comprehensive guidelines and examples for prompt engineering, facilitating beginners’ access to LLM-based robotics solutions. Through tutorial-level examples and structured prompt construction, we illustrate how LLM-guided enhancements can be seamlessly integrated into robotics applications. This survey serves as a roadmap for researchers navigating the evolving landscape of LLM-driven robotics, offering a comprehensive overview and practical guidance for harnessing the power of language models in robotics development.
Funder
Korea Advanced Institute of Science and Technology
Publisher
Springer Science and Business Media LLC
Reference170 articles.
1. Agia C, Jatavallabhula KM, Khodeir M et al (2022) Taskography: evaluating robot task planning over large 3d scene graphs. In: Proceedings of the conference on robot learning (CoRL), pp 46–58
2. Anil R, Borgeaud S, Wu Y et al (2023) Gemini: a family of highly capable multimodal models. pp 1–62. arXiv preprint arXiv:2312.11805
3. Arkin J, Park D, Roy S et al (2020) Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions. Int J Robot Res (IJRR) 39:1279–1304
4. Axelsson A, Skantze G (2023) Do you follow? a fully automated system for adaptive robot presenters. In: Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI), pp 102–111
5. Barber DJ, Howard TM, Walter MR (2016) A multimodal interface for real-time soldier-robot teaming. In: Unmanned systems technology XVIII, p 98370M
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献