Analyzing the Impact of Oncological Data at Different Time Points and Tumor Biomarkers on Artificial Intelligence Predictions for Five-Year Survival in Esophageal Cancer-Reference-Cited by-同舟云学术

Analyzing the Impact of Oncological Data at Different Time Points and Tumor Biomarkers on Artificial Intelligence Predictions for Five-Year Survival in Esophageal Cancer

Published:2024-03-19 Issue:1 Volume:6 Page:679-698
ISSN:2504-4990
Container-title:Machine Learning and Knowledge Extraction
language:en
Short-container-title:MAKE

Author:

Lukomski Leandra¹^ORCID,Pisula Juan²,Wirsik Naita¹,Damanakis Alexander¹,Jung Jin-On¹,Knipper Karl¹^ORCID,Datta Rabi¹^ORCID,Schröder Wolfgang¹^ORCID,Gebauer Florian³,Schmidt Thomas¹,Quaas Alexander⁴^ORCID,Bozek Katarzyna²,Bruns Christiane¹,Popp Felix¹

Affiliation:

1. Department of General, Visceral and Cancer Surgery, Faculty of Medicine and University Hospital of Cologne, Kerpener Straße 62, 50937 Cologne, Germany

2. Data science of Bioimages Lab, Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital of Cologne, Robert-Koch-Straße 21, 50937 Cologne, Germany

3. Department of General, Visceral and Cancer Surgery, Helios University Hospital Wuppertal, University Witten/Herdecke, Heusnerstraße 40, 42283 Wuppertal, Germany

4. Institute of Pathology, Faculty of Medicine and University Hospital of Cologne, Kerpener Straße 62, 50937 Cologne, Germany

Abstract

AIM: In this study, we use Artificial Intelligence (AI), including Machine (ML) and Deep Learning (DL), to predict the long-term survival of resectable esophageal cancer (EC) patients in a high-volume surgical center. Our objective is to evaluate the predictive efficacy of AI methods for survival prognosis across different time points of oncological treatment. This involves comparing models trained with clinical data, integrating either Tumor, Node, Metastasis (TNM) classification or tumor biomarker analysis, for long-term survival predictions. METHODS: In this retrospective study, 1002 patients diagnosed with EC between 1996 and 2021 were analyzed. The original dataset comprised 55 pre- and postoperative patient characteristics and 55 immunohistochemically evaluated biomarkers following surgical intervention. To predict the five-year survival status, four AI methods (Random Forest RF, XG Boost XG, Artificial Neural Network ANN, TabNet TN) and Logistic Regression (LR) were employed. The models were trained using three predefined subsets of the training dataset as follows: (I) the baseline dataset (BL) consisting of pre-, intra-, and postoperative data, including the TNM but excluding tumor biomarkers, (II) clinical data accessible at the time of the initial diagnostic workup (primary staging dataset, PS), and (III) the PS dataset including tumor biomarkers from tissue microarrays (PS + biomarkers), excluding TNM status. We used permutation feature importance for feature selection to identify only important variables for AI-driven reduced datasets and subsequent model retraining. RESULTS: Model training on the BL dataset demonstrated similar predictive performances for all models (Accuracy, ACC: 0.73/0.74/0.76/0.75/0.73; AUC: 0.78/0.82/0.83/0.80/0.79 RF/XG/ANN/TN/LR, respectively). The predictive performance and generalizability declined when the models were trained with the PS dataset. Surprisingly, the inclusion of biomarkers in the PS dataset for model training led to improved predictions (PS dataset vs. PS dataset + biomarkers; ACC: 0.70 vs. 0.77/0.73 vs. 0.79/0.71 vs. 0.75/0.69 vs. 0.72/0.63 vs. 0.66; AUC: 0.77 vs. 0.83/0.80 vs. 0.85/0.76 vs. 0.86/0.70 vs. 0.76/0.70 vs. 0.69 RF/XG/ANN/TN/LR, respectively). The AI models outperformed LR when trained with the PS datasets. The important features shared after AI-driven feature selection in all models trained with the BL dataset included histopathological lymph node status (pN), histopathological tumor size (pT), clinical tumor size (cT), age at the time of surgery, and postoperative tracheostomy. Following training with the PS dataset with biomarkers, the important predictive features included patient age at the time of surgery, TP-53 gene mutation, Mesothelin expression, thymidine phosphorylase (TYMP) expression, NANOG homebox protein expression, and indoleamine 2,3-dioxygenase (IDO) expressed on tumor-infiltrating lymphocytes, as well as tumor-infiltrating Mast- and Natural killer cells. CONCLUSION: Different AI methods similarly predict the long-term survival status of patients with EC and outperform LR, the state-of-the-art classification model. Survival status can be predicted with similar predictive performance with patient data at an early stage of treatment when utilizing additional biomarker analysis. This suggests that individual survival predictions can be made early in cancer treatment by utilizing biomarkers, reducing the necessity for the pathological TNM status post-surgery. This study identifies important features for survival predictions that vary depending on the timing of oncological treatment.

Publisher

MDPI AG

Link

https://www.mdpi.com/2504-4990/6/1/32/pdf

Reference67 articles.

1. High-performance medicine: The convergence of human and artificial intelligence;Topol;Nat. Med.,2019

2. Machine learning and deep learning;Janiesch;Electron. Mark.,2021

3. Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer;Gong;J. Thorac. Dis.,2021

4. Machine learning for optimized individual survival prediction in resectable upper gastrointestinal cancer;Jung;J. Cancer Res. Clin. Oncol.,2022

5. Prediction of survival in patients with esophageal carcinoma using artificial neural networks;Sato;Cancer,2005