A quality assurance framework for routine monitoring of deep learning cardiac substructure computed tomography segmentation models in radiotherapy

Author:

Jin Xiyao1,Hao Yao1,Hilliard Jessica1,Zhang Zhehao1,Thomas Maria A.1,Li Hua1,Jha Abhinav K.23,Hugo Geoffrey D.1

Affiliation:

1. Department of Radiation Oncology Washington University in St. Louis School of Medicine St. Louis Missouri USA

2. Department of Biomedical Engineering Washington University in St. Louis St. Louis Missouri USA

3. Mallinckrodt Institute of Radiology Washington University in St. Louis St. Louis Missouri USA

Abstract

AbstractBackgroundFor autosegmentation models, the data used to train the model (e.g., public datasets and/or vendor‐collected data) and the data on which the model is deployed in the clinic are typically not the same, potentially impacting the performance of these models by a process called domain shift. Tools to routinely monitor and predict segmentation performance are needed for quality assurance. Here, we develop an approach to perform such monitoring and performance prediction for cardiac substructure segmentation.PurposeTo develop a quality assurance (QA) framework for routine or continuous monitoring of domain shift and the performance of cardiac substructure autosegmentation algorithms.MethodsA benchmark dataset consisting of computed tomography (CT) images along with manual cardiac substructure delineations of 241 breast cancer radiotherapy patients were collected, including one “normal” image domain of clean images and five “abnormal” domains containing images with artifact (metal, contrast), pathology, or quality variations due to scanner protocol differences (field of view, noise, reconstruction kernel, and slice thickness). The QA framework consisted of an image domain shift detector which operated on the input CT images and a shape quality detector on the output of an autosegmentation model, and a regression model for predicting autosegmentation model performance. The image domain shift detector was composed of a trained denoising autoencoder (DAE) and two hand‐engineered image quality features to detect normal versus abnormal domains in the input CT images. The shape quality detector was a variational autoencoder (VAE) trained to estimate the shape quality of the auto‐segmentation results. The output from the image domain shift and shape quality detectors was used to train a regression model to predict the per‐patient segmentation accuracy, measured by Dice coefficient similarity (DSC) to physician contours. Different regression techniques were investigated including linear regression, Bagging, Gaussian process regression, random forest, and gradient boost regression. Of the 241 patients, 60 were used to train the autosegmentation models, 120 for training the QA framework, and the remaining 61 for testing the QA framework. A total of 19 autosegmentation models were used to evaluate QA framework performance, including 18 convolutional neural network (CNN)‐based and one transformer‐based model.ResultsWhen tested on the benchmark dataset, all abnormal domains resulted in a significant DSC decrease relative to the normal domain for CNN models (), but only for some domains for the transformer model. No significant relationship was found between the performance of an autosegmentation model and scanner protocol parameters () except noise (). CNN‐based autosegmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41 with added noise, while the transformer‐based model was not significantly affected (ANOVA, ). For the QA framework, linear regression models with bootstrap aggregation resulted in the highest mean absolute error (MAE) of 0.041 ± 0.002, in predicted DSC (relative to true DSC between autosegmentation and physician). MAE was lowest when combining both input (image) detectors and output (shape) detectors compared to output detectors alone.ConclusionsA QA framework was able to predict cardiac substructure autosegmentation model performance for clinically anticipated “abnormal” domain shifts.

Publisher

Wiley

Subject

General Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3