The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations-Reference-Cited by-同舟云学术

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Laugel Thibault¹,Lesot Marie-Jeanne¹,Marsala Christophe¹,Renard Xavier²,Detyniecki Marcin²¹³

Affiliation:

1. Sorbonne Université, CNRS, Laboratoire d’Informatique de Paris 6, LIP6, F-75005 Paris, France

2. AXA, Paris, France

3. Polish Academy of Science, IBS PAN, Warsaw, Poland

Abstract

Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained black-box model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generating unjustified counterfactual examples by investigating the local neighborhoods of instances whose predictions are to be explained and show that this risk is quite high for several datasets. Furthermore, we show that most state of the art approaches do not differentiate justified from unjustified counterfactual examples, leading to less useful explanations.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 70 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel tree-based method for interpretable reinforcement learning;ACM Transactions on Knowledge Discovery from Data;2024-09-09

2. Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review;ACM Computing Surveys;2024-07-09

3. What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-07

4. Enhancing Reliability Through Interpretability: A Comprehensive Survey of Interpretable Intelligent Fault Diagnosis in Rotating Machinery;IEEE Access;2024

5. Towards Non-adversarial Algorithmic Recourse;Communications in Computer and Information Science;2024