Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis-Reference-Cited by-同舟云学术

Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis

Published:2021-12-24 Issue:1 Volume:12 Page:40
ISSN:2075-4418
Container-title:Diagnostics
language:en
Short-container-title:Diagnostics

Author:

Nauta Meike^ORCID,Walsh Ricky,Dubowski Adam,Seifert Christin

Abstract

Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.

Publisher

MDPI AG

Subject

Clinical Biochemistry

Link

https://www.mdpi.com/2075-4418/12/1/40/pdf

Reference38 articles.

1. Speech Recognition Using Deep Neural Networks: A Systematic Review

2. A survey of deep learning techniques for autonomous driving

3. Deep learning for image-based cancer detection and diagnosis − A survey

4. Deep Learning for IoT Big Data and Streaming Analytics: A Survey

5. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey of recent advances in analysis of skin images;Evolutionary Intelligence;2024-08-25

2. Spuriousness-Aware Meta-Learning for Learning Robust Classifiers;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

3. Feature Selection-driven Bias Deduction in Histopathology Images: Tackling Site-Specific Influences;2024 IEEE Congress on Evolutionary Computation (CEC);2024-06-30

4. T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity;2024-05-21

5. A survey on computer vision approaches for automated classification of skin diseases;Multimedia Tools and Applications;2024-05-03