Author:
Bewley Tom,Lawry Jonathan
Abstract
In explainable artificial intelligence, there is increasing interest in understanding the behaviour of autonomous agents to build trust and validate performance. Modern agent architectures, such as those trained by deep reinforcement learning, are currently so lacking in interpretable structure as to effectively be black boxes, but insights may still be gained from an external, behaviourist perspective. Inspired by conceptual spaces theory, we suggest that a versatile first step towards general understanding is to discretise the state space into convex regions, jointly capturing similarities over the agent's action, value function and temporal dynamics within a dataset of observations. We create such a representation using a novel variant of the CART decision tree algorithm, and demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A survey on interpretable reinforcement learning;Machine Learning;2024-04-19
2. Unraveling Explainable Reinforcement Learning Using Behavior Tree Structures;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
3. Transparency of Task Dependencies of Reinforcement Learning in Unmanned Systems;2024 IEEE International Conference on Industrial Technology (ICIT);2024-03-25
4. Aligning Human and Robot Representations;Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction;2024-03-11
5. Interpretable Imitation Learning with Symbolic Rewards;ACM Transactions on Intelligent Systems and Technology;2023-12-19