Validation pipeline for machine learning algorithm assessment for multiple vendors-Reference-Cited by-同舟云学术

Validation pipeline for machine learning algorithm assessment for multiple vendors

Published:2022-04-29 Issue:4 Volume:17 Page:e0267213
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Bizzo Bernardo C.^ORCID,Ebrahimian Shadi,Walters Mark E.,Michalski Mark H.,Andriole Katherine P.,Dreyer Keith J.,Kalra Mannudeep K.,Alkasab Tarik,Digumarthy Subba R.

Abstract

A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor “black box” algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61–0.74), groundglass (0.66–0.86) and part-solid (0.52–0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference18 articles.

1. RSNA Pediatric Bone Age Challenge. Radiological Society of North America 2017. http://rsnachallenges.cloudapp.net/competitions/4. Accessed August 23, 2018.

2. Data Science Bowl 2017. Kaggle. https://www.kaggle.com/c/data-science-bowl-2017. Accessed August 23, 2018.

3. ISLES: Ischemic Stroke Lesion Segmentation Challenge. http://www.isles-challenge.org. Accessed August 23, 2018.

4. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS);BH Menze;IEEE Trans Med Imaging,2015

5. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge;AAA Setio;Med Image Anal,2017

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Absolute ground truth-based validation of computer-aided nodule detection and volumetry in low-dose CT imaging;Physica Medica;2024-05

2. Ant: a process aware annotation software for regulatory compliance;Artificial Intelligence and Law;2023-08-09

3. Preventing Artificial Intelligence in Medical Imaging From Perpetuating Health Care Biases and Disparities;Journal of the American College of Radiology;2022-12