Abstract
Deep Reinforcement Learning (DRL) algorithms have been widely studied for sequential decision-making problems, and substantial progress has been achieved, especially in autonomous robotic skill learning. However, it is always difficult to deploy DRL methods in practical safety-critical robot systems, since the training and deployment environment gap always exists, and this issue would become increasingly crucial due to the ever-changing environment. Aiming at efficiently robotic skill transferring in a dynamic environment, we present a meta-reinforcement learning algorithm based on a variational information bottleneck. More specifically, during the meta-training stage, the variational information bottleneck first has been applied to infer the complete basic tasks for the whole task space, then the maximum entropy regularized reinforcement learning framework has been used to learn the basic skills consistent with that of basic tasks. Once the training stage is completed, all of the tasks in the task space can be obtained by a nonlinear combination of the basic tasks, thus, the according skills to accomplish the tasks can also be obtained by some way of a combination of the basic skills. Empirical results on several highly nonlinear, high-dimensional robotic locomotion tasks show that the proposed variational information bottleneck regularized deep reinforcement learning algorithm can improve sample efficiency by 200–5000 times on new tasks. Furthermore, the proposed algorithm achieves substantial asymptotic performance improvement. The results indicate that the proposed meta-reinforcement learning framework makes a significant step forward to deploy the DRL-based algorithm to practical robot systems.
Funder
Natural Science Foundation of Sichuan Province
Fundamental Research Funds for the Central Universities
National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference60 articles.
1. Human-level control through deep reinforcement learning;Mnih;Nature,2015
2. Mastering the game of go with deep neural networks and tree search;Silver;Nature,2016
3. Data-efficient hierarchical reinforcement learning for robotic assembly control applications;Hou;IEEE Trans. Ind. Electron.,2020
4. Funk, N., Chalvatzaki, G., Belousov, B., and Peters, J. (2022, January 14–18). Learn2assemble with structured representations and search for robotic architectural construction. Proceedings of the 5th Conference on Robot Learning, PMLR, Auckland, New Zealand.
5. Guez, A., Vincent, R.D., Avoli, M., and Pineau, J. (2008). Adaptive treatment of epilepsy via batch-mode reinforcement learning. AAAI, 1671–1678.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Survey on Information Bottleneck;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-08