Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study-Reference-Cited by-同舟云学术

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

Published:2023-01-18 Issue: Volume:11 Page:e38412
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Kentley Jonathan^ORCID,Weber Jochen^ORCID,Liopyris Konstantinos^ORCID,Braun Ralph P^ORCID,Marghoob Ashfaq A^ORCID,Quigley Elizabeth A^ORCID,Nelson Kelly^ORCID,Prentice Kira^ORCID,Duhaime Erik^ORCID,Halpern Allan C^ORCID,Rotemberg Veronica^ORCID

Abstract

Background Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images. Objective The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts. Methods First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic “subfeatures” labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic “superfeatures” based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters. Results In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average–expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels. Conclusions This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference57 articles.

1. Methods and rates of dermoscopy usage: A cross-sectional survey of US dermatologists stratified by years in practice

2. Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults

3. Dermoscopy Image Analysis: Overview and Future Directions

4. Pattern analysis, not simplified algorithms, is the most reliable method for teaching dermoscopy for melanoma diagnosis to residents in dermatology

5. The ABCD rule of dermatoscopy

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Training family medicine residents in dermoscopy? Just say “e-learning”: pilot study (Preprint);2024-01-02

2. Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study;JMIR Dermatology;2023-12-26

3. Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Dataset Labeling: A Prospective Analysis (Preprint);2023-08-04

4. Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study (Preprint);2023-05-01

5. Towards Expert-Amateur Collaboration: Prototypical Label Isolation Learning for Left Atrium Segmentation with Mixed-Quality Labels;Lecture Notes in Computer Science;2023