Sample size and predictive performance of machine learning methods with survival data: A simulation study-Reference-Cited by-同舟云学术

Sample size and predictive performance of machine learning methods with survival data: A simulation study

Published:2023-11-10 Issue:30 Volume:42 Page:5657-5675
ISSN:0277-6715
Container-title:Statistics in Medicine
language:en
Short-container-title:Statistics in Medicine

Author:

Infante Gabriele¹²,Miceli Rosalba²,Ambrogi Federico¹³^ORCID

Affiliation:

1. Department of Clinical Sciences and Community Health University of Milan Milan Italy

2. Unit of Biostatistics for Clinical Research Fondazione IRCCS Istituto Nazionale dei Tumori Milan Italy

3. Scientific Directorate IRCCS Policlinico San Donato San Donato Milanese Italy

Abstract

Prediction models are increasingly developed and used in diagnostic and prognostic studies, where the use of machine learning (ML) methods is becoming more and more popular over traditional regression techniques. For survival outcomes the Cox proportional hazards model is generally used and it has been proven to achieve good prediction performances with few strong covariates. The possibility to improve the model performance by including nonlinearities, covariate interactions and time‐varying effects while controlling for overfitting must be carefully considered during the model building phase. On the other hand, ML techniques are able to learn complexities from data at the cost of hyper‐parameter tuning and interpretability. One aspect of special interest is the sample size needed for developing a survival prediction model. While there is guidance when using traditional statistical models, the same does not apply when using ML techniques. This work develops a time‐to‐event simulation framework to evaluate performances of Cox regression compared, among others, to tuned random survival forest, gradient boosting, and neural networks at varying sample sizes. Simulations were based on replications of subjects from publicly available databases, where event times were simulated according to a Cox model with nonlinearities on continuous variables and time‐varying effects and on the SEER registry data.

Funder

Ministero dell’Istruzione, dell’Università e della Ricerca

Publisher

Wiley

Subject

Statistics and Probability,Epidemiology

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9931

Reference59 articles.

1. Medical Risk Prediction

2. SperrE.PubMed by Year [Internet]; 2016.http://esperr.github.io/pubmed‐by‐year/

3. Multivariate prediction of coronary heart disease in the Western Collaborative Group Study compared to the findings of the Framingham study.

4. Big Data and Machine Learning in Health Care

5. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning for High Sigma Analog Designs (Invited);Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD;2024-09-09

2. Developing clinical prediction models: a step-by-step guide;BMJ;2024-09-03

3. Concerns Over Prediction of Kidney Discard and Nonrecovery;JAMA Surgery;2024-06-01

4. Predictive Modeling of Drug‐Related Adverse Events with Real‐World Data: A Case Study of Linezolid Hematologic Outcomes;Clinical Pharmacology & Therapeutics;2024-02-12