Author:
Schneider Johannes,Meske Christian,Vlachos Michalis
Abstract
AbstractProviding rationales for decisions can enhance transparency and cultivate trust. Nevertheless, in light of economic incentives and other factors that may encourage manipulation, the reliability of such explanations comes into question. This manuscript builds upon a previous conference paper$$^*$$
∗
by introducing a conceptual framework for deceptive explanations and constructing a typology grounded in interdisciplinary literature. The focus of our work is on how AI models can generate and detect deceptive explanations. In our empirical evaluation, we focus on text classification and introduce modifications to the explanations generated by GradCAM, a well-established method for explaining neural networks. Through a user study comprising 200 participants, we demonstrate that these deceptive explanations have the potential to mislead individuals. However, we also demonstrate that machine learning (ML) techniques can discern even subtle deceptive tactics with an accuracy exceeding 80%, given sufficient domain expertise. Furthermore, even in the absence of domain knowledge, unsupervised learning can be employed to identify inconsistencies in the explanations, provided that fundamental information about the underlying predictive model is accessible.
Funder
University of Liechtenstein
Publisher
Springer Science and Business Media LLC
Subject
Computer Science Applications,Computer Networks and Communications,Computer Graphics and Computer-Aided Design,Computational Theory and Mathematics,Artificial Intelligence,General Computer Science
Reference65 articles.
1. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access. 2018;6:52138–60.
2. Adebayo J, Gilmer J, Muelly M, et al. Sanity checks for saliency maps. In: Neural information processing systems 2018.
3. Adelani D, Mai H, Fang F, et al. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection 2019. arXiv:1907.09177
4. Aivodji U, Arai H, Fortineau O, et al. Fairwashing: the risk of rationalization. In: Int. Conf. on Machine Learning(ICML) 2019.
5. Aroyo AM, Gonzalez-Billandon J, Tonelli A, et al. Can a humanoid robot spot a liar? In: Int. Conf. on Humanoid Robots, 2018;1045–1052
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献