Learning modular language-conditioned robot policies through attention-Reference-Cited by-同舟云学术

Learning modular language-conditioned robot policies through attention

Published:2023-08-30 Issue:8 Volume:47 Page:1013-1033
ISSN:0929-5593
Container-title:Autonomous Robots
language:en
Short-container-title:Auton Robot

Author:

Zhou Yifan,Sonawani Shubham,Phielipp Mariano,Ben Amor Heni,Stepputtis Simon

Abstract

AbstractTraining language-conditioned policies is typically time-consuming and resource-intensive. Additionally, the resulting controllers are tailored to the specific robot they were trained on, making it difficult to transfer them to other robots with different dynamics. To address these challenges, we propose a new approach called Hierarchical Modularity, which enables more efficient training and subsequent transfer of such policies across different types of robots. The approach incorporates Supervised Attention which bridges the gap between modular and end-to-end learning by enabling the re-use of functional building blocks. In this contribution, we build upon our previous work, showcasing the extended utilities and improved performance by expanding the hierarchy to include new tasks and introducing an automated pipeline for synthesizing a large quantity of novel objects. We demonstrate the effectiveness of this approach through extensive simulated and real-world robot manipulation experiments.

Funder

National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10514-023-10129-1.pdf

Reference64 articles.

1. Abolghasemi, P., Mazaheri, A., Shah, M., et al. (2019). Pay attention!-robustifying a deep visuomotor policy through task-focused visual attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4254–4262).

2. Ahn, M., Brohan, A., Brown, N., et al. (2022). Do as I can, not as I say: Grounding language in robotic affordances. arXiv:2204.01691

3. Alayrac, J. B., Donahue, J., Luc, P., et al. (2022). Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35, 23716–23736.

4. Anderson, P., Shrivastava, A., Parikh, D., et al. (2019). Chasing ghosts: Instruction following as Bayesian state tracking. In Advances in neural information processing systems (Vol. 32).

5. Antol, S., Agrawal, A., Lu, J., et al. (2015). Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433).