Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration-Reference-Cited by-同舟云学术

Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration

Published:2013-06 Issue:6 Volume:25 Page:1512-1547
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Zhao Tingting¹,Hachiya Hirotaka¹,Tangkaratt Voot¹,Morimoto Jun²,Sugiyama Masashi¹

Affiliation:

1. Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan

2. Department of Brain Robot Interface, ATR Computational Neuroscience Labs, Kyoto, 619-0288, Japan

Abstract

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00452

Reference30 articles.

1. Reinforcement Learning and Dynamic Programming Using Function Approximators

2. CB: a humanoid research platform for exploring neuroscience

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Two-Loop Acceleration Autopilot Design and Analysis Based on TD3 Strategy;International Journal of Aerospace Engineering;2023-03-23

2. Hierarchical Reinforcement Learning Integrating With Human Knowledge for Practical Robot Skill Learning in Complex Multi-Stage Manipulation;IEEE Transactions on Automation Science and Engineering;2023

3. Deep learning, reinforcement learning, and world models;Neural Networks;2022-08

4. Policy search for active fault diagnosis with partially observable state;International Journal of Adaptive Control and Signal Processing;2022-06-15

5. Variational Bayesian Parameter-Based Policy Exploration;2020 International Joint Conference on Neural Networks (IJCNN);2020-07