Interpretation of Neural Networks Is Fragile-Reference-Cited by-同舟云学术

Interpretation of Neural Networks Is Fragile

Published:2019-07-17 Issue: Volume:33 Page:3681-3688
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Ghorbani Amirata,Abid Abubakar,Zou James

Abstract

In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned the same predicted label, yet have very different interpretations. We systematically characterize the robustness of interpretations generated by several widely-used feature importance interpretation methods (feature importance maps, integrated gradients, and DeepLIFT) on ImageNet and CIFAR-10. In all cases, our experiments show that systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly susceptible to adversarial attack. Our analysis of the geometry of the Hessian matrix gives insight on why robustness is a general challenge to current interpretation approaches.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 239 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EvaluateXAI: A framework to evaluate the reliability and consistency of rule-based XAI techniques for software analytics tasks;Journal of Systems and Software;2024-11

2. Beyond model interpretability: socio-structural explanations in machine learning;AI & SOCIETY;2024-09-05

3. Toward Learning Model-Agnostic Explanations for Deep Learning-Based Signal Modulation Classifiers;IEEE Transactions on Reliability;2024-09

4. A propagation path-based interpretable neural network model for fault detection and diagnosis in chemical process systems;Control Engineering Practice;2024-09

5. On the interpretability of quantum neural networks;Quantum Machine Intelligence;2024-08-28