Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic-Reference-Cited by-同舟云学术

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Published:2024-02-02 Issue:1 Volume:24 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Kagerbauer Simone Maria,Ulm Bernhard,Podtschaske Armin Horst,Andonov Dimislav Ivanov,Blobner Manfred,Jungwirth Bettina,Graessner Martin

Abstract

Abstract Background Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift. Methods We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014–2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (1) we weighted older data weaker, (2) used only the most recent data for model training and (3) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features. Results The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters. Conclusions Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

Funder

German Federal Ministry for Economic Affairs and Energy

Universität Ulm

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12911-024-02428-z.pdf

Reference34 articles.

1. Rahmani K, et al. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inform. 2023;173:104930. https://doi.org/10.1016/j.ijmedinf.2022.104930.

2. Morger A, et al. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep. 2022;12:7244. https://doi.org/10.1038/s41598-022-09309-3.

3. Dilmegani C. Model Retraining: Why and How to Retrain ML Models? https://research.aimultiple.com/model-retraining/ (2023), Access Date: 14 Apr 2023.

4. Das D, Sarkar S, Debroy A. Impact of COVID-19 on changing consumer behaviour: lessons from an emerging economy. Int J Consum Stud. 2022;46:692–715. https://doi.org/10.1111/ijcs.12786.

5. Safara SA. Computational model to Predict Consumer Behaviour during COVID-19 pandemic. Comput Econ. 2022;59:1525–38. https://doi.org/10.1007/s10614-020-10069-3.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Correction: Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic;BMC Medical Informatics and Decision Making;2024-02-19