Consistency of convolutional neural networks in dermoscopic melanoma recognition: A prospective real‐world study about the pitfalls of augmented intelligence-Reference-Cited by-同舟云学术

Consistency of convolutional neural networks in dermoscopic melanoma recognition: A prospective real‐world study about the pitfalls of augmented intelligence

Published:2023-12-29 Issue: Volume: Page:
ISSN:0926-9959
Container-title:Journal of the European Academy of Dermatology and Venereology
language:en
Short-container-title:Acad Dermatol Venereol

Author:

Goessinger E. V.¹^ORCID,Cerminara S. E.¹^ORCID,Mueller A. M.¹,Gottfrois P.¹,Huber S.¹,Amaral M.¹,Wenz F.¹,Kostner L.¹,Weiss L.¹,Kunz M.¹,Maul J.‐T.²^ORCID,Wespi S.¹,Broman E.¹,Kaufmann S.¹,Patpanathapillai V.¹,Treyer I.¹,Navarini A. A.¹^ORCID,Maul L. V.¹²^ORCID

Affiliation:

1. Department of Dermatology University Hospital Basel Basel Switzerland

2. Department of Dermatology University Hospital Zurich Zurich Switzerland

Abstract

AbstractBackgroundDeep‐learning convolutional neural networks (CNNs) have outperformed even experienced dermatologists in dermoscopic melanoma detection under controlled conditions. It remains unexplored how real‐world dermoscopic image transformations affect CNN robustness.ObjectivesTo investigate the consistency of melanoma risk assessment by two commercially available CNNs to help formulate recommendations for current clinical use.MethodsA comparative cohort study was conducted from January to July 2022 at the Department of Dermatology, University Hospital Basel. Five dermoscopic images of 116 different lesions on the torso of 66 patients were captured consecutively by the same operator without deliberate rotation. Classification was performed by two CNNs (CNN‐1/CNN‐2). Lesions were divided into four subgroups based on their initial risk scoring and clinical dignity assessment. Reliability was assessed by variation and intraclass correlation coefficients. Excisions were performed for melanoma suspicion or two consecutively elevated CNN risk scores, and benign lesions were confirmed by expert consensus (n = 3).Results117 repeated image series of 116 melanocytic lesions (2 melanomas, 16 dysplastic naevi, 29 naevi, 1 solar lentigo, 1 suspicious and 67 benign) were classified. CNN‐1 demonstrated superior measurement repeatability for clinically benign lesions with an initial malignant risk score (mean variation coefficient (mvc): CNN‐1: 49.5(±34.3)%; CNN‐2: 71.4(±22.5)%; p = 0.03), while CNN‐2 outperformed for clinically benign lesions with benign scoring (mvc: CNN‐1: 49.7(±22.7)%; CNN‐2: 23.8(±29.3)%; p = 0.002). Both systems exhibited lowest score consistency for lesions with an initial malignant risk score and benign assessment. In this context, averaging three initial risk scores achieved highest sensitivity of dignity assessment (CNN‐1: 94%; CNN‐2: 89%). Intraclass correlation coefficients indicated ‘moderate’‐to‐‘good’ reliability for both systems (CNN‐1: 0.80, 95% CI:0.71–0.87, p < 0.001; CNN‐2: 0.67, 95% CI:0.55–0.77, p < 0.001).ConclusionsPotential user‐induced image changes can significantly influence CNN classification. For clinical application, we recommend using the average of three initial risk scores. Furthermore, we advocate for CNN robustness optimization by cross‐validation with repeated image sets.Trial RegistrationClinicalTrials.gov (NCT04605822).

Publisher

Wiley

Subject

Infectious Diseases,Dermatology

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/jdv.19777

Reference21 articles.

1. Melanoma staging: Evidence-based changes in the American Joint Committee on Cancer eighth edition cancer staging manual

2. Global Burden of Cutaneous Melanoma in 2020 and Projections to 2040

3. Robustness of convolutional neural networks in recognition of pigmented skin lesions

4. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

5. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Image-Based Artificial Intelligence in Psoriasis Assessment: The Beginning of a New Diagnostic Era?;American Journal of Clinical Dermatology;2024-09-11

2. Melanomscreening;TumorDiagnostik & Therapie;2024-08

3. Exploring the analysis capabilities and clinical application potential of the Claude 3 Opus in different dermatologic images: the development of a large-scale multimodal model to assist in dermatology clinical practice (Preprint);2024-06-12

4. Melanocytic lesions: How to navigate variations in human and artificial intelligence;Journal of the European Academy of Dermatology and Venereology;2024-04-25