Multi-objective ω-Regular Reinforcement Learning-Reference-Cited by-同舟云学术

Multi-objective ω-Regular Reinforcement Learning

Published:2023-06-30 Issue:2 Volume:35 Page:1-24
ISSN:0934-5043
Container-title:Formal Aspects of Computing
language:en
Short-container-title:Form. Asp. Comput.

Author:

Hahn Ernst Moritz¹^ORCID,Perez Mateo²^ORCID,Schewe Sven³^ORCID,Somenzi Fabio²^ORCID,Trivedi Ashutosh²^ORCID,Wojtczak Dominik³^ORCID

Affiliation:

1. University of Twente, The Netherlands

2. University of Colorado Boulder, USA

3. University of Liverpool, UK

Abstract

The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference , where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, Mungojerrie , and we present an experimental evaluation of our technique on benchmark learning problems.

Funder

Engineering and Physical Sciences Research Council

National Science Foundation

European Union’s Horizon 2020 research and innovation programme

Publisher

Association for Computing Machinery (ACM)

Subject

Theoretical Computer Science,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3605950

Reference72 articles.

1. M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence. 2669–2678.

2. D. Andersson and P. B. Miltersen2009. The complexity of solving stochastic games on graphs. In Algorithms and Computation. 112–121.

3. T. Babiak, F. Blahoudek, A. Duret-Lutz, J. Klein, J. Křetínský, D. Müller, D. Parker, and J. Strejček. 2015. The Hanoi \(\omega\) -automata format. In Proceedings of the International Conference on Computer Aided Verification (CAV’15). 479–486. LNCS 9206.

4. Ch. Baier and M. Größer. 2005. Recognizing \(\omega\) -regular languages with probabilistic automata. In Proceedings of the Conference on Logic in Computer Science (LICS’05). 137–146.

5. Ch. Baier and J.-P. Katoen. 2008. Principles of Model Checking. MIT Press.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dynamic preference inference network: Improving sample efficiency for multi-objective reinforcement learning by preference estimation;Knowledge-Based Systems;2024-11

2. Multimodal multiscale dynamic graph convolution networks for stock price prediction;Pattern Recognition;2024-05

3. Auction-Based Scheduling;Lecture Notes in Computer Science;2024