Realistic Actor-Critic: A framework for balance between value overestimation and underestimation-Reference-Cited by-同舟云学术

Realistic Actor-Critic: A framework for balance between value overestimation and underestimation

Published:2023-01-09 Issue: Volume:16 Page:
ISSN:1662-5218
Container-title:Frontiers in Neurorobotics
language:
Short-container-title:Front. Neurorobot.

Author:

Li Sicen,Tang Qinyun,Pang Yiming,Ma Xinmeng,Wang Gang

Abstract

IntroductionThe value approximation bias is known to lead to suboptimal policies or catastrophic overestimation bias accumulation that prevent the agent from making the right decisions between exploration and exploitation. Algorithms have been proposed to mitigate the above contradiction. However, we still lack an understanding of how the value bias impact performance and a method for efficient exploration while keeping stable updates. This study aims to clarify the effect of the value bias and improve the reinforcement learning algorithms to enhance sample efficiency.MethodsThis study designs a simple episodic tabular MDP to research value underestimation and overestimation in actor-critic methods. This study proposes a unified framework called Realistic Actor-Critic (RAC), which employs Universal Value Function Approximators (UVFA) to simultaneously learn policies with different value confidence-bound with the same neural network, each with a different under overestimation trade-off.ResultsThis study highlights that agents could over-explore low-value states due to inflexible under-overestimation trade-off in the fixed hyperparameters setting, which is a particular form of the exploration-exploitation dilemma. And RAC performs directed exploration without over-exploration using the upper bounds while still avoiding overestimation using the lower bounds. Through carefully designed experiments, this study empirically verifies that RAC achieves 10x sample efficiency and 25% performance improvement compared to Soft Actor-Critic in the most challenging Humanoid environment. All the source codes are available at https://github.com/ihuhuhu/RAC.DiscussionThis research not only provides valuable insights for research on the exploration-exploitation trade-off by studying the frequency of policies access to low-value states under different value confidence-bounds guidance, but also proposes a new unified framework that can be combined with current actor-critic methods to improve sample efficiency in the continuous control domain.

Funder

National Natural Science Foundation of China

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Biomedical Engineering

Reference55 articles.

1. A review of uncertainty quantification in deep learning: techniques, applications and challenges;Abdar;Inf. Fusion,2021

2. Learning awareness models;Amos;arXiv preprint arXiv:1804.06318.,2018

3. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning176185 AnschelO. BaramN. ShimkinN. International Conference on Machine Learning2017

4. Agent57: outperforming the atari human benchmark507517 BadiaA. P. PiotB. KapturowskiS. SprechmannP. VitvitskyiA. GuoZ. D. International Conference on Machine Learning

5. Never give up: learning directed exploration strategies;Badia;arXiv preprint arXiv:2002.06038.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Actor-Critic With Synthesis Loss for Solving Approximation Biases;IEEE Transactions on Cybernetics;2024-09