Model-Free Preference-Based Reinforcement Learning-Reference-Cited by-同舟云学术

Model-Free Preference-Based Reinforcement Learning

Published:2016-03-02 Issue:1 Volume:30 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wirth Christian,Fürnkranz Johannes,Neumann Gerhard

Abstract

Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preference-based estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-learning and autonomously adapting manufacturing equipment for the circular factory;at - Automatisierungstechnik;2024-09-01

2. Adaptive Bitrate Algorithms via Deep Reinforcement Learning With Digital Twins Assisted Trajectory;IEEE Transactions on Network Science and Engineering;2024-07

3. Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement;2024 21st International Conference on Ubiquitous Robots (UR);2024-06-24

4. Hierarchical Reinforcement Learning from Demonstration via Reachability-Based Reward Shaping;Neural Processing Letters;2024-05-27

5. Personalizing Activity Selection in Assistive Social Robots from Explicit and Implicit User Feedback;International Journal of Social Robotics;2024-04-09