Abstract
AbstractOBJECTIVEContouring Collaborative for Consensus in Radiation Oncology (C3RO) is a crowdsourced challenge engaging radiation oncologists across various expertise levels in segmentation. A challenge in artificial intelligence (AI) development is the paucity of multi-expert datasets; consequently, we sought to characterize whether aggregate segmentations generated from multiple non-experts could meet or exceed recognized expert agreement.MATERIALS AND METHODSParticipants who contoured ≥1 region of interest (ROI) for the breast, sarcoma, head and neck (H&N), gynecologic (GYN), or gastrointestinal (GI) challenge were identified as a non-expert or recognized expert. Cohort-specific ROIs were combined into single simultaneous truth and performance level estimation (STAPLE) consensus segmentations. STAPLEnon-expert ROIs were evaluated against STAPLEexpert contours using Dice Similarity Coefficient (DSC). The expert interobserver DSC (IODSCexpert) was calculated as an acceptability threshold between STAPLEnon-expert and STAPLEexpert. To determine the number of non-experts required to match the IODSCexpert for each ROI, a single consensus contour was generated using variable numbers of non-experts and then compared to the IODSCexpert.RESULTSFor all cases, the DSC for STAPLEnon-expert versus STAPLEexpert were higher than comparator expert IODSCexpert for most ROIs. The minimum number of non-expert segmentations needed for a consensus ROI to achieve IODSCexpert acceptability criteria ranged between 2-4 for breast, 3-5 for sarcoma, 3-5 for H&N, 3-5 for GYN ROIs, and 3 for GI ROIs.DISCUSSION AND CONCLUSIONMultiple non-expert-generated consensus ROIs met or exceeded expert-derived acceptability thresholds. 5 non-experts could potentially generate consensus segmentations for most ROIs with performance approximating experts, suggesting non-expert segmentations as feasible cost-effective AI inputs.
Publisher
Cold Spring Harbor Laboratory