Benchmarking saliency methods for chest X-ray interpretation-Reference-Cited by-同舟云学术

Benchmarking saliency methods for chest X-ray interpretation

Published:2021-03-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Saporta Adriel^ORCID,Gui Xiaotong^ORCID,Agrawal Ashwin^ORCID,Pareek Anuj^ORCID,Truong Steven QH,Nguyen Chanh DT,Ngo Van-Doan,Seekins Jayne,Blankenberg Francis G.,Ng Andrew Y.,Lungren Matthew P.,Rajpurkar Pranav^ORCID

Abstract

AbstractSaliency methods, which “explain” deep neural networks by producing heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. Although many saliency methods have been proposed for medical imaging interpretation, rigorous investigation of the accuracy and reliability of these strategies is necessary before they are integrated into the clinical setting. In this work, we quantitatively evaluate three saliency methods (Grad-CAM, Grad-CAM++, and Integrated Gradients) across multiple neural network architectures using two evaluation metrics. We establish the first human benchmark for chest X-ray interpretation in a multilabel classification set up, and examine under what clinical conditions saliency maps might be more prone to failure in localizing important pathologies compared to a human expert benchmark. We find that (i) while Grad-CAM generally localized pathologies better than the two other saliency methods, all three performed significantly worse compared with the human benchmark; (ii) the gap in localization performance between Grad-CAM and the human benchmark was largest for pathologies that had multiple instances, were smaller in size, and had shapes that were more complex; (iii) model confidence was positively correlated with Grad-CAM localization performance. Our work demonstrates that several important limitations of saliency methods must be addressed before we can rely on them for deep learning explainability in medical imaging.

Publisher

Cold Spring Harbor Laboratory

Reference60 articles.

1. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

2. Opening the black box of machine learning in radiology: can the proximity of annotated cases be a way?;Eur. Radiol. Exp,2020

3. A survey on deep learning in medical image analysis

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transformer models in biomedicine;BMC Medical Informatics and Decision Making;2024-07-29

2. Towards improving the visual explainability of artificial intelligence in the clinical setting;BMC Digital Health;2023-07-11

3. Guidelines and evaluation of clinical explainable AI in medical image analysis;Medical Image Analysis;2023-02

4. DBCE : A Saliency Method for Medical Deep Learning Through Anatomically-Consistent Free-Form Deformations;2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2023-01

5. 3D deep convolution neural network for radiation pneumonitis prediction following stereotactic body radiotherapy;Journal of Applied Clinical Medical Physics;2022-12-22