Are We There Yet? The Value of Deep Learning in a Multicenter Setting for Response Prediction of Locally Advanced Rectal Cancer to Neoadjuvant Chemoradiotherapy-Reference-Cited by-同舟云学术

Are We There Yet? The Value of Deep Learning in a Multicenter Setting for Response Prediction of Locally Advanced Rectal Cancer to Neoadjuvant Chemoradiotherapy

Published:2022-06-30 Issue:7 Volume:12 Page:1601
ISSN:2075-4418
Container-title:Diagnostics
language:en
Short-container-title:Diagnostics

Author:

Wichtmann Barbara D.^ORCID,Albert Steffen^ORCID,Zhao Wenzhao,Maurer Angelika^ORCID,Rödel Claus,Hofheinz Ralf-Dieter,Hesser Jürgen,Zöllner Frank G.,Attenberger Ulrike I.

Abstract

This retrospective study aims to evaluate the generalizability of a promising state-of-the-art multitask deep learning (DL) model for predicting the response of locally advanced rectal cancer (LARC) to neoadjuvant chemoradiotherapy (nCRT) using a multicenter dataset. To this end, we retrained and validated a Siamese network with two U-Nets joined at multiple layers using pre- and post-therapeutic T2-weighted (T2w), diffusion-weighted (DW) images and apparent diffusion coefficient (ADC) maps of 83 LARC patients acquired under study conditions at four different medical centers. To assess the predictive performance of the model, the trained network was then applied to an external clinical routine dataset of 46 LARC patients imaged without study conditions. The training and test datasets differed significantly in terms of their composition, e.g., T-/N-staging, the time interval between initial staging/nCRT/re-staging and surgery, as well as with respect to acquisition parameters, such as resolution, echo/repetition time, flip angle and field strength. We found that even after dedicated data pre-processing, the predictive performance dropped significantly in this multicenter setting compared to a previously published single- or two-center setting. Testing the network on the external clinical routine dataset yielded an area under the receiver operating characteristic curve of 0.54 (95% confidence interval [CI]: 0.41, 0.65), when using only pre- and post-therapeutic T2w images as input, and 0.60 (95% CI: 0.48, 0.71), when using the combination of pre- and post-therapeutic T2w, DW images, and ADC maps as input. Our study highlights the importance of data quality and harmonization in clinical trials using machine learning. Only in a joint, cross-center effort, involving a multidisciplinary team can we generate large enough curated and annotated datasets and develop the necessary pre-processing pipelines for data harmonization to successfully apply DL models clinically.

Funder

Deutsche Forschungsgemeinschaft

Ministry of Science, Research and the Arts Baden-Württemberg

Publisher

MDPI AG

Subject

Clinical Biochemistry

Link

https://www.mdpi.com/2075-4418/12/7/1601/pdf

Reference52 articles.

1. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries

2. Krebs in Deutschland für 2017/2018,2021