Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts-Reference-Cited by-同舟云学术

Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Azizsoltani Hamoon¹,Kim Yeo Jin¹,Ausin Markel Sanz¹,Barnes Tiffany¹,Chi Min¹

Affiliation:

1. North Carolina State University

Abstract

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP-inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Time-aware deep reinforcement learning with multi-temporal abstraction;Applied Intelligence;2023-03-25

2. The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way;International Journal of Artificial Intelligence in Education;2022-11-28

3. InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem;2021 IEEE International Conference on Big Data (Big Data);2021-12-15

4. Multi-Temporal Abstraction with Time-Aware Deep Q-Learning for Septic Shock Prevention;2021 IEEE International Conference on Big Data (Big Data);2021-12-15

5. To Reduce Healthcare Workload: Identify Critical Sepsis Progression Moments through Deep Reinforcement Learning;2021 IEEE International Conference on Big Data (Big Data);2021-12-15