Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents


Da Silva Felipe Leno,Hernandez-Leal Pablo,Kartal Bilal,Taylor Matthew E.


Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in sequential decision making problems, the sample-complexity of RL techniques still represents a major challenge for practical applications. To combat this challenge, whenever a competent policy (e.g., either a legacy system or a human demonstrator) is available, the agent could leverage samples from this policy (advice) to improve sample-efficiency. However, advice is normally limited, hence it should ideally be directed to states where the agent is uncertain on the best action to execute. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state. RCMP takes into account that the advice is limited and might be suboptimal. We also describe a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods. Our empirical evaluations show that RCMP performs better than Importance Advising, not receiving advice, and receiving it at random states in Gridworld and Atari Pong scenarios.


Association for the Advancement of Artificial Intelligence (AAAI)


General Medicine

Cited by 19 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Automated design of action advising trigger conditions for multiagent reinforcement learning: A genetic programming-based approach;Swarm and Evolutionary Computation;2024-03

2. A location-based advising method in teacher–student frameworks;Knowledge-Based Systems;2024-02

3. Uncertainty Quantification for Efficient and Risk-Sensitive Reinforcement Learning;2023 IEEE Symposium Series on Computational Intelligence (SSCI);2023-12-05

4. Ask-AC: An Initiative Advisor-in-the-Loop Actor–Critic Framework;IEEE Transactions on Systems, Man, and Cybernetics: Systems;2023-12

5. Active Reward Learning from Online Preferences;2023 IEEE International Conference on Robotics and Automation (ICRA);2023-05-29








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3