Fooling Partial Dependence via Data Poisoning-Reference-Cited by-同舟云学术

Fooling Partial Dependence via Data Poisoning

Published:2023 Issue: Volume: Page:121-136
ISSN:0302-9743
Container-title:Machine Learning and Knowledge Discovery in Databases
language:
Short-container-title:

Author:

Baniecki Hubert,Kretowicz Wojciech,Biecek Przemyslaw

Abstract

AbstractMany methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-26409-2_8

Reference54 articles.

1. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS (2018)

2. Adebayo, J., Muelly, M., Liccardi, I., Kim, B.: Debugging tests for model explanations. In: NeurIPS (2020)

3. Aivodji, U., Arai, H., Fortineau, O., Gambs, S., Hara, S., Tapp, A.: Fairwashing: the risk of rationalization. In: ICML (2019)

4. Alber, M., Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K.T., et al.: iNNvestigate neural networks! J. Mach. Learn. Res. 20(93), 1–8 (2019)

5. Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 82(4), 1059–1086 (2020)

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Adversarial attacks and defenses in explainable artificial intelligence: A survey;Information Fusion;2024-07

2. SoK: Unintended Interactions among Machine Learning Defenses and Risks;2024 IEEE Symposium on Security and Privacy (SP);2024-05-19

3. On the Robustness of Global Feature Effect Explanations;Lecture Notes in Computer Science;2024