Role of sureness in evaluating AI/CADx: Lesion‐based repeatability of machine learning classification performance on breast MRI

Author:

Whitney Heather M.1,Drukker Karen1,Vieceli Michael2,Van Dusen Amy2,de Oliveira Michelle2,Abe Hiroyuki1,Giger Maryellen L.1

Affiliation:

1. Department of Radiology The University of Chicago Chicago Illinois USA

2. Department of Physics Wheaton College Wheaton Illinois USA

Abstract

AbstractBackgroundArtificial intelligence/computer‐aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion‐based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier.PurposeThe purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast‐enhanced magnetic resonance (DCE‐MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features.MethodsImages of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c‐means method and thirty‐two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole‐set classification performance was evaluated at two levels: (1) the 0.632+ bias‐corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1–95% confidence interval of the classifier output for each lesion for each classifier. Lesion‐based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion‐based performance across the bootstrap iterations.ResultsIn classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = −0.003 [−0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion‐based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion‐based performance.ConclusionsWhen there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion‐based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion‐based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.

Publisher

Wiley

Subject

General Medicine

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Sureness of classification of breast cancers as pure DCIS or with invasive components on DCE-MRI;17th International Workshop on Breast Imaging (IWBI 2024);2024-05-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3