Uncertainty quantification in multi‐class segmentation: Comparison between Bayesian and non‐Bayesian approaches in a clinical perspective

Author:

Scalco Elisa1ORCID,Pozzi Silvia2,Rizzo Giovanna3ORCID,Lanzarone Ettore2ORCID

Affiliation:

1. Institute of Biomedical Technologies (ITB) National Research Council (CNR) Segrate Milan Italy

2. Department of Management, Information and Production Engineering University of Bergamo Bergamo Italy

3. Institute Of Intelligent Industrial Technologies and Systems (STIIMA) National Research Council (CNR) Milan Italy

Abstract

AbstractBackgroundAutomatic segmentation techniques based on Convolutional Neural Networks (CNNs) are widely adopted to automatically identify any structure of interest from a medical image, as they are not time consuming and not subject to high intra‐ and inter‐operator variability. However, the adoption of these approaches in clinical practice is slowed down by some factors, such as the difficulty in providing an accurate quantification of their uncertainty.PurposeThis work aims to evaluate the uncertainty quantification provided by two Bayesian and two non‐Bayesian approaches for a multi‐class segmentation problem, and to compare the risk propensity among these approaches, considering CT images of patients affected by renal cancer (RC).MethodsFour uncertainty quantification approaches were implemented in this work, based on a benchmark CNN currently employed in medical image segmentation: two Bayesian CNNs with different regularizations (Dropout and DropConnect), named BDR and BDC, an ensemble method (Ens) and a test‐time augmentation (TTA) method. They were compared in terms of segmentation accuracy, using the Dice score, uncertainty quantification, using the ratio of correct‐certain pixels (RCC) and incorrect‐uncertain pixels (RIU), and with respect to inter‐observer variability in manual segmentation. They were trained with the Kidney and Kidney Tumor Segmentation Challenge launched in 2021 (Kits21), for which multi‐class segmentations of kidney, RC, and cyst on 300 CT volumes are available. Moreover, they were tested considering this and other two public renal CT datasets.ResultsAccuracy results achieved large differences across the structures of interest for all approaches, with an average Dice score of 0.92, 0.58, and 0.21 for kidney, tumor, and cyst, respectively. In terms of uncertainties, TTA provided the highest uncertainty, followed by Ens and BDC, whereas BDR provided the lowest, and minimized the number of incorrect certain pixels worse than the other approaches. Again, large differences were seen across the three structures in terms of RCC and RIU. These metrics were associated with different risk propensity, as BDR was the most risk‐taking approach, able to provide higher accuracy in its prediction, but failing to assign uncertainty on incorrect segmentation in every case. The other three approaches were more conservative, providing large uncertainty regions, with the drawback of giving alert also on correct areas. Finally, the analysis of the inter‐observer segmentation variability showed a significant variation among the four approaches on the external dataset, with BDR reporting the lowest agreement (Dice = 0.82), and TTA obtaining the highest score (Dice = 0.94).ConclusionsOur outcomes highlight the importance of quantifying the segmentation uncertainty and that decision‐makers can choose the approach most in line with the risk propensity degree required by the application and their policy.

Publisher

Wiley

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3