Enforcing Almost-Sure Reachability in POMDPs-Reference-Cited by-同舟云学术

Enforcing Almost-Sure Reachability in POMDPs

Published:2021 Issue: Volume: Page:602-625
ISSN:0302-9743
Container-title:Computer Aided Verification
language:
Short-container-title:

Author:

Junges Sebastian^ORCID,Jansen Nils^ORCID,Seshia Sanjit A.^ORCID

Abstract

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Publisher

Springer International Publishing

Link

https://link.springer.com/content/pdf/10.1007/978-3-030-81688-9_28

Reference58 articles.

1. Akametalu, A.K., Kaynama, S., Fisac, J.F., Zeilinger, M.N., Gillula, J.H., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431. IEEE (2014)

2. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. AAAI Press (2018)

3. Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agents Multi Agent Syst. 21(3), 293–320 (2010). https://doi.org/10.1007/s10458-009-9103-z

4. Baier, C., Größer, M., Bertrand, N.: Probabilistic $$\omega $$-automata. J. ACM 59(1), 1:1–1:52 (2012)

5. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Safe POMDP Online Planning via Shielding;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

2. Risk-aware shielding of Partially Observable Monte Carlo Planning policies;Artificial Intelligence;2023-11

3. Task-guided IRL in POMDPs that scales;Artificial Intelligence;2023-04

4. Intelligent and Dependable Decision-Making Under Uncertainty;Formal Methods;2023

5. Robust Almost-Sure Reachability in Multi-Environment MDPs;Tools and Algorithms for the Construction and Analysis of Systems;2023