BACKGROUND
Poststroke immobility gets patients more vulnerable to stroke-relevant complications. Urinary tract infection (UTI) is one of major nosocomial infections significantly affecting the outcomes of immobile stroke patients. Previous studies have identified several risk factors, but it is still challenging to accurately estimate personal UTI risk due to unclear interaction of various factors and variability of individual characteristics. This calls for more precise and trust-worthy predictive models to assist with potential UTI identification.
OBJECTIVE
The aim of this study was to develop predictive models for UTI risk identification for immobile stroke patients. A prospective analysis was conducted to evaluate the effectiveness and clinical interpretability of the models.
METHODS
The data used in this study were collected from the Common Complications of Bedridden Patients and the Construction of Standardized Nursing Intervention Model (CCBPC). Derivation cohort included data of 3982 immobile stroke patients collected during CCBPC-I, from November 1, 2015 to June 30, 2016; external validation cohort included data of 3837 immobile stroke patients collected during CCBPC-II, from November 1, 2016 to July 30, 2017. 6 machine learning models and an ensemble learning model were derived based on 80% of derivation cohort and its effectiveness was evaluated with the remaining 20% of derivation cohort data. We further compared the effectiveness of predictive models in external validation cohort. The performance of logistic regression without regularization was used as a reference. We used Shapley additive explanation values to determine feature importance and examine the clinical significance of prediction models. Shapely values of the factors were calculated to represent the magnitude, prevalence, and direction of their effects, and were further visualized in a summary plot.
RESULTS
A total of 103(2.59%) patients were diagnosed with UTI in derivation cohort(N=3982); the internal validation cohort (N=797) shared the same incidence. The external validation cohort had a UTI incidence of 1.38% (N=53). Evaluation results showed that the ensemble learning model performed the best in area under the receiver operating characteristic (ROC) curve in internal validation, up to 82.2%; second best in external validation, 80.8%. In addition, the ensemble learning model performed the best sensitivity in both internal and external validation sets (80.9% and 81.1%, respectively). We also identified seven UTI risk factors (pneumonia, glucocorticoid use, female sex, mixed cerebrovascular disease, increased age, prolonged length of stay, and duration of catheterization) contributing most to the predictive model, thus demonstrating the clinical interpretability of model.
CONCLUSIONS
Our ensemble learning model demonstrated promising performance. Identifying UTI risk and detecting high risk factors among immobile stroke patients would allow more selective and effective use of preventive interventions, thus improving clinical outcomes. Future work should focus on developing a more concise scoring tool and prospectively examining the model in practical use.