Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning-Reference-Cited by-同舟云学术

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Published:2022-12-15 Issue:24 Volume:11 Page:4192
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Beyene Sofanit Wubeshet,Han Ji-Hyeong^ORCID

Abstract

Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting.

Funder

MSIT (Ministry of Science and ICT), Korea, under the ITRC

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/24/4192/pdf

Reference38 articles.

1. Dzedzickis, A., Subačiūtė-žemaitienė, J., Šutinys, E., Samukaitė-Bubnienė, U., and Bučinskas, V. (2022). Advanced applications of industrial robotics: New trends and possibilities. Appl. Sci., 12.

2. Hua, J., Zeng, L., Li, G., and Ju, Z. (2021). Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors, 21.

3. Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the IEEE International Conference on Robotics and Automation, Singapore.

4. Parisotto, E., Ba, J., and Salakhutdinov, R. (2015). Actor-mimic deep multitask and transfer reinforcement learning. arXiv.

5. Rusu, A.A., Colmenarejo, S.G., Gülçehre, Ç., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. (2015). Policy distillation. arXiv.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dual experience replay-based TD3 for single intersection signal control;The Journal of Supercomputing;2024-03-29

2. A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation;Sensors;2023-04-05