Assessing systematic weaknesses of DNNs using counterfactuals-Reference-Cited by-同舟云学术

Assessing systematic weaknesses of DNNs using counterfactuals

Published:2024-01-08 Issue:1 Volume:4 Page:27-35
ISSN:2730-5953
Container-title:AI and Ethics
language:en
Short-container-title:AI Ethics

Author:

Gannamaneni Sujan Sai,Mock Michael,Akila Maram

Abstract

AbstractWith the advancement of DNNs into safety-critical applications, testing approaches for such models have gained more attention. A current direction is the search for and identification of systematic weaknesses that put safety assumptions based on average performance values at risk. Such weaknesses can take on the form of (semantically coherent) subsets or areas in the input space where a DNN performs systematically worse than its expected average. However, it is non-trivial to attribute the reason for such observed low performances to the specific semantic features that describe the subset. For instance, inhomogeneities within the data w.r.t. other (non-considered) attributes might distort results. However, taking into account all (available) attributes and their interaction is often computationally highly expensive. Inspired by counterfactual explanations, we propose an effective and computationally cheap algorithm to validate the semantic attribution of existing subsets, i.e., to check whether the identified attribute is likely to have caused the degraded performance. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data, where we show for a semantic segmentation model that (i) performance differences among the different pedestrian assets exist, but (ii) only in some cases is the asset type itself the reason for this reduction in the performance.

Funder

Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s43681-023-00407-0.pdf

Reference43 articles.

1. Akhtar, N., Mian, A.: Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 6, 14410–14430 (2018)

2. Atzmueller, M.: Subgroup discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(1), 35–49 (2015)

3. Buolamwini, J., Gebru, T.: Gender shades: Intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency, 77–91. PMLR 2018

4. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., Mukhopadhyay, D.: Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 (2018)

5. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern. Anal. Mach. Intell. 40(4), 834–848 (2017)