Affiliation:
1. Saarland University, Saarland Informatics Campus, Germany
2. Saarland University and German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus Saarbrücken, Germany
3. Saarland University and German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus, Germany
Abstract
Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes
average
rewards, which may disregard rare but critical situations and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a methodology that can help to address both deficiencies. We incorporate
evaluation stages
(ES) into DRL, leveraging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations and (ii) allowing to foster arbitrary objectives.
We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i) and (ii).
Funder
German Research Foundation
European Regional Development Fund
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,Modeling and Simulation
Reference62 articles.
1. Solving the Rubik’s cube with deep reinforcement learning and search;Agostinelli Forest;Nat. Mach. Intell.,2019
2. Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In 32nd AAAI Conference on Artificial Intelligence.
3. Ron Amit, Ron Meir, and Kamil Ciosek. 2020. Discount factor as a regularizer in reinforcement learning. In International Conference on Machine Learning. PMLR, 269–278.
4. Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting
5. Guy Avni, Roderick Bloem, Krishnendu Chatterjee, Thomas A. Henzinger, Bettina Könighofer, and Stefan Pranger. 2019. Run-time optimization for learned controllers through quantitative games. In International Conference on Computer Aided Verification. Springer, 630–649.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Introduction to the Special Issue on QEST 2021;ACM Transactions on Modeling and Computer Simulation;2023-10-31