Explainable reinforcement learning (XRL): a systematic literature review and taxonomy-Reference-Cited by-同舟云学术

Explainable reinforcement learning (XRL): a systematic literature review and taxonomy

Published:2023-11-29 Issue:1 Volume:113 Page:355-441
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Bekkemoen Yanzhe^ORCID

Abstract

AbstractIn recent years, reinforcement learning (RL) systems have shown impressive performance and remarkable achievements. Many achievements can be attributed to combining RL with deep learning. However, those systems lack explainability, which refers to our understanding of the system’s decision-making process. In response to this challenge, the new explainable RL (XRL) field has emerged and grown rapidly to help us understand RL systems. This systematic literature review aims to give a unified view of the field by reviewing ten existing XRL literature reviews and 189 XRL studies from the past five years. Furthermore, we seek to organize these studies into a new taxonomy, discuss each area in detail, and draw connections between methods and stakeholder questions (e.g., “how can I get the agent to do _?”). Finally, we look at the research trends in XRL, recommend XRL methods, and present some exciting research directions for future research. We hope stakeholders, such as RL researchers and practitioners, will utilize this literature review as a comprehensive resource to overview existing state-of-the-art XRL methods. Additionally, we strive to help find research gaps and quickly identify methods that answer stakeholder questions.

Funder

Norges Teknisk-Naturvitenskapelige Universitet

NTNU Norwegian University of Science and Technology

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-023-06479-7.pdf

Reference345 articles.

1. Abbeel, P., & Ng, AY. (2004). Apprenticeship learning via inverse reinforcement learning. In: C. E. Brodley (Ed.), Machine learning, Proceedings of the twenty-first international conference (ICML 2004), ACM International Conference Proceeding Series, vol 69. ACM https://doi.org/10.1145/1015330.1015430,

2. Acharya, A., Russell, R.L., & Ahmed, N.R. (2020). Explaining conditions for reinforcement learning behaviors from real and imagined data. NeurIPS Workshop on Challenges of Real-World RL https://doi.org/10.48550/ARXIV.2011.09004

3. Achiam, J. (2018). Spinning up in deep reinforcement learning. https://spinningup.openai.com/en/latest/index.html

4. Adebayo, J., Gilmer, J., Muelly, M., et al. (2018). Sanity checks for saliency maps. In S. Bengio , H. M. Wallach, H. Larochelle et al. (Eds.), Advances in neural information processing systems 31: Annual conference on neural information processing systems NeurIPS 2018, Montréal, pp 9525–9536, https://proceedings.neurips.cc/paper/2018/hash/294a8ed24b1ad22ec2e7efea049b8737-Abstract.html

5. Adebayo, J., Muelly, M., Abelson, H., et al. (2022). Post hoc explanations may be ineffective for detecting unknown spurious correlation. In The tenth international conference on learning representations, ICLR 2022, Virtual Event. OpenReview.net, https://openreview.net/forum?id=xNOVfCCvDpM