DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version

Author:

Gros Timo P.1ORCID,Groß Joschka1ORCID,Höller Daniel1ORCID,Hoffmann Jörg2ORCID,Klauck Michaela1ORCID,Meerkamp Hendrik1ORCID,Müller Nicola J.1ORCID,Schaller Lukas1ORCID,Wolf Verena3ORCID

Affiliation:

1. Saarland University, Saarland Informatics Campus, Germany

2. Saarland University and German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus Saarbrücken, Germany

3. Saarland University and German Research Center for Artificial Intelligence (DFKI), Saarland Informatics Campus, Germany

Abstract

Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situations and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures, which, for DRL to work, must be replaced with proxy objectives. Here, we introduce a methodology that can help to address both deficiencies. We incorporate evaluation stages (ES) into DRL, leveraging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations and (ii) allowing to foster arbitrary objectives. We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i) and (ii).

Funder

German Research Foundation

European Regional Development Fund

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,Modeling and Simulation

Reference62 articles.

1. Solving the Rubik’s cube with deep reinforcement learning and search;Agostinelli Forest;Nat. Mach. Intell.,2019

2. Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In 32nd AAAI Conference on Artificial Intelligence.

3. Ron Amit, Ron Meir, and Kamil Ciosek. 2020. Discount factor as a regularizer in reinforcement learning. In International Conference on Machine Learning. PMLR, 269–278.

4. Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting

5. Guy Avni, Roderick Bloem, Krishnendu Chatterjee, Thomas A. Henzinger, Bettina Könighofer, and Stefan Pranger. 2019. Run-time optimization for learned controllers through quantitative games. In International Conference on Computer Aided Verification. Springer, 630–649.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Introduction to the Special Issue on QEST 2021;ACM Transactions on Modeling and Computer Simulation;2023-10-31

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3