A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments-Reference-Cited by-同舟云学术

A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

Published:2019-08-15 Issue:4 Volume:28 Page:273-292
ISSN:1059-7123
Container-title:Adaptive Behavior
language:en
Short-container-title:Adaptive Behavior

Author:

Abdelfattah Sherif¹^ORCID,Kasmarik Kathryn¹,Hu Jiankun¹

Affiliation:

1. School of Engineering and Information Technology, UNSW Canberra, Canberra, ACT, Australia

Abstract

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

Publisher

SAGE Publications

Subject

Behavioral Neuroscience,Experimental and Cognitive Psychology

Link

http://journals.sagepub.com/doi/pdf/10.1177/1059712319869313

Reference29 articles.

1. Evolutionary Multiobjective Optimization

2. Preference-Based Policy Learning

3. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

4. Robustness Against the Decision-Maker's Attitude to Risk in Problems With Conflicting Objectives

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Online reinforcement learning-based inventory control for intelligent E-Fulfilment dealing with nonstationary demand;Enterprise Information Systems;2023-11-26

2. Design of Cognitive Jamming Decision-Making System Against MFR Based on Reinforcement Learning;IEEE Transactions on Vehicular Technology;2023-08

3. The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning;2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob);2020-10-26