1. Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. 2019. Fairwashing: the risk of rationalization. In International Conference on Machine Learning. PMLR, 161–170.
2. David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018).
3. Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. 2017. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017).
4. The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations
5. The hidden assumptions behind counterfactual explanations and principal reasons