BACKGROUND
Rehospitalizations are a major cost driver in patients with multiple chronic conditions and healthcare in general. Hospital readmission prediction models based on healthcare data for patients with multiple chronic conditions is very limited.
OBJECTIVE
The aim of this study is to improve and validate a hospital readmission prediction model using electronic health records data for a complex medical condition involving multiple chronic conditions.
METHODS
We employed a retrospective study design using electronic health records data on patients with multiple chronic conditions obtained from a person centered multidisciplinary clinic at a tertiary academic medical hospital. The study deployed multiple machine learning models, i.e., Multivariable Logistic Regression, Gaussian Naïve Bayes, Support Vector Machine, Neural Network Multilevel Perceptron, K Nearest Neighbour, Ensemble Gradient Boosting Classifier, and Random Forest and compared their predictive performance. The outcomes of interest for this study were all-cause 7-day and 30-day hospital readmissions.
RESULTS
Accuracy improved markedly once the models were trained on HND subgroups, specifically the heart failure subgroup, which had the highest number of patients and readmissions of all subgroups. The best performing models were the Ensemble Gradient Boosting Classifier and the logistic regression model. The 30-day readmission prediction model performance for Ensemble Gradient Boosting Classifier and logistic regression improved from AUROC score of 0.65 (AUPRC: 0.34) to AUROC score of 0.79 (AUPRC: 0.63) and 0.81 (AUPRC: 0.64) for the entire group and the heart failure subgroup, respectively.
CONCLUSIONS
Predictive technologies using machine learning have the potential to identify hospital readmissions among high-risk sub-groups of patients with multiple chronic conditions. The findings demonstrated that predictive modeling in complex patients with multiple chronic conditions perform better at the patient subgroup levels with most similar characteristics compared to overall patient population. Key variables for 7-day and 30-day hospital readmissions were previous outpatient and inpatient visit counts, previous hospitalizations, acute and non-acute nature of the previous readmission, and previous laboratory encounter counts. Ensemble Gradient Boosting Classifier, Random Forest, MLP Neural Network and the logistic regression models performed better in predicting hospital readmissions. Thus, before developing predictive modeling, target patients should be divided into self-similar clusters with machine learning models. Predictive technologies are effective in reducing healthcare costs by predicting hospital readmissions in distinctly similar sub-groups of complex chronic patients.