Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis-Reference-Cited by-同舟云学术

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis

Published:2024-06-12 Issue: Volume:5 Page:e45973-e45973
ISSN:2563-6316
Container-title:JMIRx Med
language:en
Short-container-title:JMIRx Med

Author:

Dong Tim^ORCID,Sinha Shubhra^ORCID,Zhai Ben^ORCID,Fudulu Daniel^ORCID,Chan Jeremy^ORCID,Narayan Pradeep^ORCID,Judge Andy^ORCID,Caputo Massimo^ORCID,Dimagli Arnaldo^ORCID,Benedetto Umberto^ORCID,Angelini Gianni D^ORCID

Abstract

Abstract Background The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed. Objective In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift. Methods We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric. Results A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P<.0001). Extreme gradient boosting (clinical effectiveness metric [CEM] 0.728, 95% CI 0.728-0.729) and random forest (CEM 0.727, 95% CI 0.727-0.728) were the overall best-performing models, both temporally and nontemporally. EuroSCORE II performed the worst across all comparisons. Sharp changes in variable importance and data set drift from October to December 2017, from June to July 2018, and from December 2018 to February 2019 mirrored the effects of performance decrease across models. Conclusions All models show a decrease in at least 3 of the 5 individual metrics. CEM and variable importance drift detection demonstrate the limitation of logistic regression methods used for cardiac surgery risk prediction and the effects of data set drift. Future work will be required to determine the interplay between ML models and whether ensemble models could improve on their respective performance advantages.

Publisher

JMIR Publications Inc.

Reference63 articles.

1. Prediction of operative mortality for patients undergoing cardiac surgical procedures without established risk scores;Ong;J Thorac Cardiovasc Surg

2. Machine learning improves mortality risk prediction after cardiac surgery: systematic review and meta-analysis;Benedetto;J Thorac Cardiovasc Surg

3. Comparison of logistic EuroSCORE and EuroSCORE II in predicting operative mortality of 1125 total arterial operations;Kieser;Eur J Cardiothorac Surg

4. The validity of the original EuroSCORE and EuroSCORE II in patients over the age of seventy;Poullis;Interact Cardiovasc Thorac Surg

5. Validation of EuroSCORE II in Chinese patients undergoing heart valve surgery;Zhang;Heart Lung Circ

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Peer Review of “Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis”;JMIRx Med;2024-06-12

2. Authors’ Response to Peer Reviews of “Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis”;JMIRx Med;2024-06-12

3. Peer Review of “Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis”;JMIRx Med;2024-06-12