Automated data preparation for in vivo tumor characterization with machine learning-Reference-Cited by-同舟云学术

Automated data preparation for in vivo tumor characterization with machine learning

Published:2022-10-11 Issue: Volume:12 Page:
ISSN:2234-943X
Container-title:Frontiers in Oncology
language:
Short-container-title:Front. Oncol.

Author:

Krajnc Denis,Spielvogel Clemens P.,Grahovac Marko,Ecsedi Boglarka,Rasul Sazan,Poetsch Nina,Traub-Weidinger Tatjana,Haug Alexander R.,Ritter Zsombor,Alizadeh Hussain,Hacker Marcus,Beyer Thomas,Papp Laszlo

Abstract

BackgroundThis study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts.MethodsA collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts.ResultsSixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps.ConclusionsThis study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

Publisher

Frontiers Media SA

Subject

Cancer Research,Oncology

Reference56 articles.

1. WHO. cancer2021

2. Molecular imaging for personalized cancer care;Kircher;Mol Oncol,2012

3. Staging PET–CT scanning provides superior detection of lymph nodes and distant metastases than traditional imaging in locally advanced breast cancer;Garg;World J Surg,2016

4. Personalizing medicine through hybrid imaging and medical big data analysis;Papp;Front Phys,2018

5. Introduction to radiomics;Mayerhoefer;J Nucl Med,2020

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning-based analysis of 68Ga-PSMA-11 PET/CT images for estimation of prostate tumor grade;Physical and Engineering Sciences in Medicine;2024-03-25

2. Incremental Role of Radiomics and Artificial Intelligence;Advanced Imaging and Therapy in Neuro-Oncology;2024

3. DEBI-NN: Distance-encoding biomorphic-informational neural networks for minimizing the number of trainable parameters;Neural Networks;2023-10

4. Error mitigation enables PET radiomic cancer characterization on quantum computers;European Journal of Nuclear Medicine and Molecular Imaging;2023-08-04

5. Machine Learning of Multi-Modal Tumor Imaging Reveals Trajectories of Response to Precision Treatment;Cancers;2023-03-14