Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset-Reference-Cited by-同舟云学术

Admission blood tests predicting survival of SARS-CoV-2 infected patients: a practical implementation of graph convolution network in imbalance dataset

Published:2024-08-09 Issue:1 Volume:24 Page:
ISSN:1471-2334
Container-title:BMC Infectious Diseases
language:en
Short-container-title:BMC Infect Dis

Author:

Lian Jie,Huang Fan,Huang Xinhai,Lau Kitty Yu-Yeung,Ng Kei Shing,Chu Carlin Chun Fai,Lam Simon Ching,Koohli-Moghadam Mohamad,Vardhanabhuti Varut

Abstract

Abstract Background Predicting an individual’s risk of death from COVID-19 is essential for planning and optimising resources. However, since the real-world mortality rate is relatively low, particularly in places like Hong Kong, this makes building an accurate prediction model difficult due to the imbalanced nature of the dataset. This study introduces an innovative application of graph convolutional networks (GCNs) to predict COVID-19 patient survival using a highly imbalanced dataset. Unlike traditional models, GCNs leverage structural relationships within the data, enhancing predictive accuracy and robustness. By integrating demographic and laboratory data into a GCN framework, our approach addresses class imbalance and demonstrates significant improvements in prediction accuracy. Methods The cohort included all consecutive positive COVID-19 patients fulfilling study criteria admitted to 42 public hospitals in Hong Kong between January 23 and December 31, 2020 (n = 7,606). We proposed the population-based graph convolutional neural network (GCN) model which took blood test results, age and sex as inputs to predict the survival outcomes. Furthermore, we compared our proposed model to the Cox Proportional Hazard (CPH) model, conventional machine learning models, and oversampling machine learning models. Additionally, a subgroup analysis was performed on the test set in order to acquire a deeper understanding of the relationship between each patient node and its neighbours, revealing possible underlying causes of the inaccurate predictions. Results The GCN model was the top-performing model, with an AUC of 0.944, considerably outperforming all other models (p < 0.05), including the oversampled CPH model (0.708), linear regression (0.877), Linear Discriminant Analysis (0.860), K-nearest neighbours (0.834), Gaussian predictor (0.745) and support vector machine (0.847). With Kaplan-Meier estimates, the GCN model demonstrated good discriminability between low- and high-risk individuals (p < 0.0001). Based on subanalysis using the weighted-in score, although the GCN model was able to discriminate well between different predicted groups, the separation was inadequate between false negative (FN) and true negative (TN) groups. Conclusion The GCN model considerably outperformed all other machine learning methods and baseline CPH models. Thus, when applied to this imbalanced COVID survival dataset, adopting a population graph representation may be an approach to achieving good prediction.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12879-024-09699-x.pdf

Reference26 articles.

1. Alballa N, Al-Turaiki I. Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: a review. Inf Med Unlocked. 2021;24:100564.

2. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160.

3. Du R, et al. Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph. Sci Rep. 2021;11(1):14250.

4. Ribeiro-Neto B, Baeza-Yates R. Modern information retrieval: the concepts and technology behind search. (2011).

5. Branco P, Torgo Luís, Rita P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR). 2016;49(2):1–50.