Abstract
Background
Serious bacterial infections (SBIs) are linked to unplanned hospital admissions and a high mortality rate. The early identification of SBIs is crucial in clinical practice.
Objective
This study aims to establish and validate clinically applicable models designed to identify SBIs in patients with infective fever.
Methods
Clinical data from 945 patients with infective fever, encompassing demographic and laboratory indicators, were retrospectively collected from a 2200-bed teaching hospital between January 2013 and December 2020. The data were randomly divided into training and test sets at a ratio of 7:3. Various machine learning (ML) algorithms, including Boruta, Lasso (least absolute shrinkage and selection operator), and recursive feature elimination, were utilized for feature filtering. The selected features were subsequently used to construct models predicting SBIs using logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) with 5-fold cross-validation. Performance metrics, including the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), accuracy, sensitivity, and other relevant parameters, were used to assess model performance. Considering both model performance and clinical needs, 2 clinical timing-sequence warning models were ultimately confirmed using LR analysis. The corresponding predictive nomograms were then plotted for clinical use. Moreover, a physician, blinded to the study, collected additional data from the same center involving 164 patients during 2021. The nomograms developed in the study were then applied in clinical practice to further validate their clinical utility.
Results
In total, 69.9% (661/945) of the patients developed SBIs. Age, hemoglobin, neutrophil-to-lymphocyte ratio, fibrinogen, and C-reactive protein levels were identified as important features by at least two ML algorithms. Considering the collection sequence of these indicators and clinical demands, 2 timing-sequence models predicting the SBI risk were constructed accordingly: the early admission model (model 1) and the model within 24 hours of admission (model 2). LR demonstrated better stability than RF and XGBoost in both models and performed the best in model 2, with an AUC, accuracy, and sensitivity of 0.780 (95% CI 0.720-841), 0.754 (95% CI 0.698-804), and 0.776 (95% CI 0.711-832), respectively. XGBoost had an advantage over LR in AUC (0.708, 95% CI 0.641-775 vs 0.686, 95% CI 0.617-754), while RF achieved better accuracy (0.729, 95% CI 0.673-780) and sensitivity (0.790, 95% CI 0.728-844) than the other 2 approaches in model 1. Two SBI-risk prediction nomograms were developed for clinical use based on LR, and they exhibited good performance with an accuracy of 0.707 and 0.750 and a sensitivity of 0.729 and 0.927 in clinical application.
Conclusions
The clinical timing-sequence warning models demonstrated efficacy in predicting SBIs in patients suspected of having infective fever and in clinical application, suggesting good potential in clinical decision-making. Nevertheless, additional prospective and multicenter studies are necessary to further confirm their clinical utility.