Evaluating the quality of visual explanations on chest X-ray images for thorax diseases classification-Reference-Cited by-同舟云学术

Evaluating the quality of visual explanations on chest X-ray images for thorax diseases classification

Published:2024-03-10 Issue:17 Volume:36 Page:10239-10255
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Rahimiaghdam Shakiba^ORCID,Alemdar Hande

Abstract

AbstractDeep learning models are extensively used but often lack transparency due to their complex internal mechanics. To bridge this gap, the field of explainable AI (XAI) strives to make these models more interpretable. However, a significant obstacle in XAI is the absence of quantifiable metrics for evaluating explanation quality. Existing techniques, reliant on manual assessment or inadequate metrics, face limitations in scalability, reproducibility, and trustworthiness. Recognizing these issues, the current study specifically addresses the quality assessment of visual explanations in medical imaging, where interpretability profoundly influences diagnostic accuracy and trust in AI-assisted decisions. Introducing novel criteria such as informativeness, localization, coverage, multi-target capturing, and proportionality, this work presents a comprehensive method for the objective assessment of various explainability algorithms. These newly introduced criteria aid in identifying optimal evaluation metrics. The study expands the domain’s analytical toolkit by examining existing metrics, which have been prevalent in recent works for similar applications, and proposing new ones. Rigorous analysis led to selecting Jensen–Shannon divergence (JS_DIV) as the most effective metric for visual explanation quality. Applied to the multi-label, multi-class diagnosis of thoracic diseases using a trained classifier on the CheXpert dataset, local interpretable model-agnostic explanations (LIME) with diverse segmentation strategies interpret the classifier’s decisions. A qualitative analysis on an unseen subset of the VinDr-CXR dataset evaluates these metrics, confirming JS_DIV’s superiority. The subsequent quantitative analysis optimizes LIME’s hyper-parameters and benchmarks its performance across various segmentation algorithms, underscoring the utility of an objective assessment metric in practical applications.

Funder

Middle East Technical University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00521-024-09587-0.pdf

Reference47 articles.

1. Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379. https://doi.org/10.1016/j.cosrev.2021.100379

2. Prasanna DL, Tripathi SL (2023) Machine and deep learning techniques for text and speech processing. In: Ghai D, Tripathi SL, Saxena S, Chanda M, Alazab M (eds) Machine learning algorithms for signal and image processing. Wiley, New York, pp 115–128. https://doi.org/10.1002/9781119861850.ch7

3. Collenette J, Atkinson K, Bench-Capon T (2023) Explainable AI tools for legal reasoning about cases: a study on the European Court of Human Rights. Artif Intell 317:103861. https://doi.org/10.1016/j.artint.2023.103861

4. Giudici P, Raffinetti E (2022) Explainable AI methods in cyber risk management. Qual Reliab Eng Int 38(3):1318–1326. https://doi.org/10.1002/qre.2939

5. Jin D, Sergeeva E, Weng W-H, Chauhan G, Szolovits P (2022) Explainable deep learning in healthcare: a methodological survey from an attribution view. WIREs Mech Dis. https://doi.org/10.1002/wsbm.1548