Affiliation:
1. Department of Computer Science and Engineering, Gyeongsang National University, Jinju 52828, Republic of Korea
2. Department of Applied Artificial Intelligence, Hanyang University, Ansan 15588, Republic of Korea
3. Division of Infectious Disease, Department of Internal Medicine, Korea University College of Medicine, Korea University Ansan Hospital, Ansan 15355, Republic of Korea
Abstract
During outbreaks of infectious diseases, such as COVID-19, it is critical to rapidly determine treatment priorities and identify patients requiring hospitalization based on clinical severity. Although various machine learning models have been developed to predict COVID-19 severity, most have limitations, such as small dataset sizes, the limited availability of clinical variables, or a constrained classification of severity levels by a single classifier. In this paper, we propose an adaptive stacking ensemble technique that identifies various COVID-19 patient severity levels and separates them into three formats: Type 1 (low or high severity), Type 2 (mild, severe, critical), and Type 3 (asymptomatic, mild, moderate, severe, fatal). To enhance the model’s generalizability, we utilized a nationwide dataset from the South Korean government, comprising data from 5644 patients across over 100 hospitals. To address the limited availability of clinical variables, our technique employs data-driven strategies and a proposed feature selection method. This ensures the availability of clinical variables across diverse hospital environments. To construct optimal stacking ensemble models, our technique adaptively selects candidate base classifiers by analyzing the correlation between their predicted outcomes and performance. It then automatically determines the optimal multi-layer combination of base and meta-classifiers using a greedy search algorithm. To further improve the performance, we applied various techniques, including imputation of missing values and oversampling. The experimental results demonstrate that our stacking ensemble models significantly outperform existing single classifiers and AutoML approaches, with improvements of 6.42% and 8.86% in F1 and AUC scores for Type 1, 9.59% and 6.68% for Type 2, and 11.94% and 9.24% for Type 3, respectively. Consequently, our approach improves the prediction of COVID-19 severity levels and potentially assists frontline healthcare providers in making informed decisions.
Funder
National Research Foundation of Korea
Reference52 articles.
1. Operating protocols of a community treatment center for isolation of patients with coronavirus disease, South Korea;Kang;J. Emerg. Infect. Dis.,2020
2. A comprehensive review of COVID-19 symptoms and treatments in the context of autoimmune diseases;Hamidi;Virol. J.,2023
3. WHO (2023, November 14). Living Guidance for Clinical Management of COVID-19. 23 November 2021. Available online: https://www.who.int/publications/i/item/WHO-2019-nCoV-clinical-2021-2.
4. Iteratively pruned deep learning ensembles for COVID-19 detection in chest X-rays;Rajaraman;IEEE Access,2020
5. Yao, H., Zhang, N., Zhang, R., Duan, M., Xie, T., Pan, J., Peng, E., Huang, J., Zhang, Y., and Xu, X. (2020). Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests. Front. Cell Dev. Biol., 8.