Abstract
AbstractThe same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.
Funder
Munich Center for Neurosciences – Brain and Mind, Ludwig-Maximilians-Universität München
Ludwig-Maximilians-Universität München
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Philosophy
Reference133 articles.
1. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160.
2. Akula, A. R., Todorovic, S., Chai, J. Y., & Zhu, S. C. (2019). Natural language interaction with explainable AI models. In CVPR workshops (pp. 87–90).
3. Alzantot, M., Sharma, Y., Chakraborty, S., Zhang, H., Hsieh, C. J., & Srivastava, M. B. (2019). Genattack: Practical black-box attacks with gradient-free optimization. In Proceedings of the genetic and evolutionary computation conference (pp. 1111–1119).
4. Anjomshoae, S., Främling, K., & Najjar, A. (2019). Explanations of black-box model predictions by contextual importance and utility. In D. Calvaresi, A. Najjar, M. Schumacher & K. Främling (Eds.), Explainable, transparent autonomous agents and multi-agent systems (pp. 95–109). Springer .
5. Asher, N., Paul, S., & Russell, C. (2020). Adequate and fair explanations. arXiv preprint arXiv:200107578.
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献