Benchmarking saliency methods for chest X-ray interpretation-Reference-Cited by-同舟云学术

Benchmarking saliency methods for chest X-ray interpretation

Published:2022-10-10 Issue:10 Volume:4 Page:867-878
ISSN:2522-5839
Container-title:Nature Machine Intelligence
language:en
Short-container-title:Nat Mach Intell

Author:

Saporta Adriel,Gui Xiaotong,Agrawal Ashwin^ORCID,Pareek Anuj^ORCID,Truong Steven Q. H.,Nguyen Chanh D. T.,Ngo Van-Doan,Seekins Jayne,Blankenberg Francis G.,Ng Andrew Y.,Lungren Matthew P.,Rajpurkar Pranav^ORCID

Abstract

AbstractSaliency methods, which produce heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. However, rigorous investigation of the accuracy and reliability of these strategies is necessary before they are integrated into the clinical setting. In this work, we quantitatively evaluate seven saliency methods, including Grad-CAM, across multiple neural network architectures using two evaluation metrics. We establish the first human benchmark for chest X-ray segmentation in a multilabel classification set-up, and examine under what clinical conditions saliency maps might be more prone to failure in localizing important pathologies compared with a human expert benchmark. We find that (1) while Grad-CAM generally localized pathologies better than the other evaluated saliency methods, all seven performed significantly worse compared with the human benchmark, (2) the gap in localization performance between Grad-CAM and the human benchmark was largest for pathologies that were smaller in size and had shapes that were more complex, and (3) model confidence was positively correlated with Grad-CAM localization performance. Our work demonstrates that several important limitations of saliency methods must be addressed before we can rely on them for deep learning explainability in medical imaging.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Networks and Communications,Computer Vision and Pattern Recognition,Human-Computer Interaction,Software

Link

https://www.nature.com/articles/s42256-022-00536-x.pdf

Reference65 articles.

1. Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).

2. Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at: https://arxiv.org/abs/1711.05225 (2017).

3. Bien, N. et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 15, e1002699 (2018).

4. Baselli, G., Codari, M. & Sardanelli, F. Opening the black box of machine learning in radiology: can the proximity of annotated cases be a way? Eur. Radiol. Exp. 4, 30 (2020).

5. Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).

Cited by 76 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reporting radiographers’ interaction with Artificial Intelligence—How do different forms of AI feedback impact trust and decision switching?;PLOS Digital Health;2024-08-07

2. Computational approaches to the figurative and plastic dimensions of images;Visual Communication;2024-08

3. Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification;Journal of Biomedical Informatics;2024-08

4. LitefusionNet: Boosting the performance for medical image classification with an intelligent and lightweight feature fusion network;Journal of Computational Science;2024-08

5. Challenges for augmenting intelligence in cardiac imaging;The Lancet Digital Health;2024-08