POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance-Reference-Cited by-同舟云学术

POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance

Published:2024-05-31 Issue: Volume: Page:
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Arcieri Giacomo^ORCID,Hoelzl Cyprien,Schwery Oliver,Straub Daniel,Papakonstantinou Konstantinos G.,Chatzi Eleni

Abstract

AbstractPartially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.

Funder

ETH Mobility Initiative

Swiss Federal Institute of Technology Zurich

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10994-024-06559-2.pdf

Reference45 articles.

1. Andriotis, C. P., & Papakonstantinou, K. G. (2019). Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Engineering and System Safety, 191, 106483.

2. Andriotis, C. P., & Papakonstantinou, K. G. (2021). Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints. Reliability Engineering and System Safety, 212, 107551.

3. Andriotis, C. P., Papakonstantinou, K. G., & Chatzi, E. N. (2021). Value of structural health information in partially observable stochastic environments. Structural Safety, 93, 102072.

4. Arcieri, G., Hoelzl, C., Schwery, O., Straub, D., Papakonstantinou, K. G., & Chatzi, E. (2023). Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: An application to railway systems. Reliability Engineering and System Safety, 109496.

5. Arcieri, G., Wölfle, D., & Chatzi, E. (2021). Which model to trust: Assessing the influence of models on the performance of reinforcement learning algorithms for continuous control tasks. arXiv:2110.13079

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Markov Decision Processes: A Comprehensive Survey of Optimization Applications and Techniques;IgMin Research;2024-07-04