Consistency of convolutional neural networks in dermoscopic melanoma recognition: A prospective real‐world study about the pitfalls of augmented intelligence

Author:

Goessinger E. V.1ORCID,Cerminara S. E.1ORCID,Mueller A. M.1,Gottfrois P.1,Huber S.1,Amaral M.1,Wenz F.1,Kostner L.1,Weiss L.1,Kunz M.1,Maul J.‐T.2ORCID,Wespi S.1,Broman E.1,Kaufmann S.1,Patpanathapillai V.1,Treyer I.1,Navarini A. A.1ORCID,Maul L. V.12ORCID

Affiliation:

1. Department of Dermatology University Hospital Basel Basel Switzerland

2. Department of Dermatology University Hospital Zurich Zurich Switzerland

Abstract

AbstractBackgroundDeep‐learning convolutional neural networks (CNNs) have outperformed even experienced dermatologists in dermoscopic melanoma detection under controlled conditions. It remains unexplored how real‐world dermoscopic image transformations affect CNN robustness.ObjectivesTo investigate the consistency of melanoma risk assessment by two commercially available CNNs to help formulate recommendations for current clinical use.MethodsA comparative cohort study was conducted from January to July 2022 at the Department of Dermatology, University Hospital Basel. Five dermoscopic images of 116 different lesions on the torso of 66 patients were captured consecutively by the same operator without deliberate rotation. Classification was performed by two CNNs (CNN‐1/CNN‐2). Lesions were divided into four subgroups based on their initial risk scoring and clinical dignity assessment. Reliability was assessed by variation and intraclass correlation coefficients. Excisions were performed for melanoma suspicion or two consecutively elevated CNN risk scores, and benign lesions were confirmed by expert consensus (n = 3).Results117 repeated image series of 116 melanocytic lesions (2 melanomas, 16 dysplastic naevi, 29 naevi, 1 solar lentigo, 1 suspicious and 67 benign) were classified. CNN‐1 demonstrated superior measurement repeatability for clinically benign lesions with an initial malignant risk score (mean variation coefficient (mvc): CNN‐1: 49.5(±34.3)%; CNN‐2: 71.4(±22.5)%; p = 0.03), while CNN‐2 outperformed for clinically benign lesions with benign scoring (mvc: CNN‐1: 49.7(±22.7)%; CNN‐2: 23.8(±29.3)%; p = 0.002). Both systems exhibited lowest score consistency for lesions with an initial malignant risk score and benign assessment. In this context, averaging three initial risk scores achieved highest sensitivity of dignity assessment (CNN‐1: 94%; CNN‐2: 89%). Intraclass correlation coefficients indicated ‘moderate’‐to‐‘good’ reliability for both systems (CNN‐1: 0.80, 95% CI:0.71–0.87, p < 0.001; CNN‐2: 0.67, 95% CI:0.55–0.77, p < 0.001).ConclusionsPotential user‐induced image changes can significantly influence CNN classification. For clinical application, we recommend using the average of three initial risk scores. Furthermore, we advocate for CNN robustness optimization by cross‐validation with repeated image sets.Trial RegistrationClinicalTrials.gov (NCT04605822).

Publisher

Wiley

Subject

Infectious Diseases,Dermatology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3