Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients-Reference-Cited by-同舟云学术

Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Published:2024-03-02 Issue:2 Volume:56 Page:
ISSN:1573-773X
Container-title:Neural Processing Letters
language:en
Short-container-title:Neural Process Lett

Author:

Saglam Baturay^ORCID,Mutlu Furkan Burak^ORCID,Cicek Dogan Can,Kozat Suleyman Serdar^ORCID

Abstract

AbstractApproximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11063-024-11461-y.pdf

Reference39 articles.

1. Likas A, Blekas K (1996) A reinforcement learning approach based on the fuzzy min–max neural network. Neural Process Lett 4(3):167–172. https://doi.org/10.1007/BF00426025

2. Zhao J (2020) Neural network-based optimal tracking control of continuous-time uncertain nonlinear system via reinforcement learning. Neural Process Lett 51(3):2513–2530. https://doi.org/10.1007/s11063-020-10220-z

3. Yi M, Yang P, Du M et al (2022) DMADRL: a distributed multi-agent deep reinforcement learning algorithm for cognitive offloading in dynamic MEC networks. Neural Process Lett. https://doi.org/10.1007/s11063-022-10811-y

4. Ferguson A, Bolouri H (1996) Improving reinforcement learning in stochastic ram-based neural networks. Neural Process Lett 3(1):11–15. https://doi.org/10.1007/BF00417784

5. Zheng L, Cho SY (2011) A modified memory-based reinforcement learning method for solving POMDP problems. Neural Process Lett 33(2):187–200. https://doi.org/10.1007/s11063-011-9172-2