Reinforcement Learning with Perturbed Rewards-Reference-Cited by-同舟云学术

Reinforcement Learning with Perturbed Rewards

Published:2020-04-03 Issue:04 Volume:34 Page:6202-6209
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wang Jingkang,Liu Yang,Li Bo

Abstract

Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e.g., zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A train trajectory optimization method based on the safety reinforcement learning with a relaxed dynamic reward;Discover Applied Sciences;2024-08-30

2. Reinforcement learning with thermal fluctuations at the nanoscale;Physical Review E;2024-08-30

3. A train trajectory optimization method based on the safety reinforcement learning with a relaxed dynamic reward;2024-06-06

4. Self-Supervised Antipodal Grasp Learning With Fine-Grained Grasp Quality Feedback in Clutter;IEEE Transactions on Industrial Electronics;2024-04

5. Security and Privacy Issues in Deep Reinforcement Learning: Threats and Countermeasures;ACM Computing Surveys;2024-02-23