Abstract
Improving trust in decisions made by classification models is becoming crucial for the acceptance of automated systems, and an important way of doing that is by providing explanations for the behaviour of the models. Different explainers have been proposed in the recent literature for that purpose, however their formal properties are under-studied.
This paper investigates theoretically explainers that provide reasons behind decisions independently of instances. Its contributions are fourfold. The first is to lay the foundations of such explainers by proposing key axioms, i.e.,
desirable properties they would satisfy. Two axioms are incompatible leading to two subsets. The second contribution consists of demonstrating that the first subset of axioms characterizes a family of explainers that return sufficient reasons while the second characterizes a family that provides necessary reasons. This sheds light on the axioms which distinguish the two types of reasons. As a third contribution, the paper introduces various explainers of both families, and fully characterizes some of them. Those explainers make use of the whole feature space. The fourth contribution is a family of explainers that generate explanations from finite datasets (subsets of the feature space). This family, seen as an abstraction of Anchors and LIME, violates some axioms including one which prevents incorrect explanations.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Counterfactual-Integrated Gradients: Counterfactual Feature Attribution for Medical Records;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05
2. A Formal Introduction to Batch-Integrated Gradients for Temporal Explanations;2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI);2023-11-06
3. Disproving XAI Myths with Formal Methods – Initial Results;2023 27th International Conference on Engineering of Complex Computer Systems (ICECCS);2023-06-14
4. A unified logical framework for explanations in classifier systems;Journal of Logic and Computation;2023-01-28
5. A New Class of Explanations for Classifiers with Non-binary Features;Logics in Artificial Intelligence;2023