Affiliation:
1. University of Twente, The Netherlands
2. University of Colorado Boulder, USA
3. University of Liverpool, UK
Abstract
The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1)
weighted preference
, where the decision maker provides scalar weights for various objectives, and (2)
lexicographic preference
, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both
faithful
(maximising reward means maximising probability of achieving the objectives under the corresponding preference) and
effective
(RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool,
Mungojerrie
, and we present an experimental evaluation of our technique on benchmark learning problems.
Funder
Engineering and Physical Sciences Research Council
National Science Foundation
European Union’s Horizon 2020 research and innovation programme
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science,Software
Reference72 articles.
1. M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence. 2669–2678.
2. D. Andersson and P. B. Miltersen2009. The complexity of solving stochastic games on graphs. In Algorithms and Computation. 112–121.
3. T. Babiak, F. Blahoudek, A. Duret-Lutz, J. Klein, J. Křetínský, D. Müller, D. Parker, and J. Strejček. 2015. The Hanoi \(\omega\) -automata format. In Proceedings of the International Conference on Computer Aided Verification (CAV’15). 479–486. LNCS 9206.
4. Ch. Baier and M. Größer. 2005. Recognizing \(\omega\) -regular languages with probabilistic automata. In Proceedings of the Conference on Logic in Computer Science (LICS’05). 137–146.
5. Ch. Baier and J.-P. Katoen. 2008. Principles of Model Checking. MIT Press.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献