Affiliation:
1. Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 266, Riyadh 11362, Saudi Arabia
2. Healthcare Information Technology Affairs (HITA), King Faisal Specialist Hospital & Research Center, P.O. Box 3354, Riyadh 11211, Saudi Arabia
3. Heart Center, King Faisal Specialist Hospital & Research Center, P.O. Box 3354, Riyadh 11211, Saudi Arabia
Abstract
Efficient management of hospital resources is essential for providing high-quality healthcare while ensuring sustainability. Length of stay (LOS), measuring the duration from admission to discharge, directly impacts patient outcomes and resource utilization. Accurate LOS prediction offers numerous benefits, including reducing re-admissions, ensuring appropriate staffing, and facilitating informed discharge planning. While conventional methods rely on statistical models and clinical expertise, recent advances in machine learning (ML) present promising avenues for enhancing LOS prediction. This research focuses on developing an ML-based LOS prediction model trained on a comprehensive real-world dataset and discussing the important factors towards practical deployment of trained ML models in clinical settings. This research involves the development of a comprehensive adult cardiac patient dataset (SaudiCardioStay (SCS)) from the King Faisal Specialist Hospital & Research Centre (KFSH&RC) hospital in Saudi Arabia, comprising 4930 patient encounters for 3611 unique patients collected from 2019 to 2022 (excluding 2020). A diverse range of classical ML models (i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), artificial neural networks (ANNs), Average Voting Regression (AvgVotReg)) are implemented for the SCS dataset to explore the potential of existing ML models in LOS prediction. In addition, this study introduces a novel approach for LOS prediction by incorporating a dedicated LOS classifier within a sophisticated ensemble methodology (i.e., Two-Level Sequential Cascade Generalization (2LSCG), Three-Level Sequential Cascade Generalization (3LSCG), Parallel Cascade Generalization (PCG)), aiming to enhance prediction accuracy and capture nuanced patterns in healthcare data. The experimental results indicate the best mean absolute error (MAE) of 0.1700 for the 3LSCG model. Relatively comparable performance was observed for the AvgVotReg model, with a MAE of 0.1703. In the end, a detailed analysis of the practical implications, limitations, and recommendations concerning the deployment of ML approaches in actual clinical settings is presented.