Affiliation:
1. Vanderbilt University
2. Vanderbilt University Medical Center
Abstract
Abstract
Objective
To assess the accuracy of machine learning models in predicting kidney stone recurrence using variables extracted from the electronic health record (EHR).
Methods
We trained three separate machine learning (ML) models (least absolute shrinkage and selection operator regression [LASSO], random forest [RF], and gradient boosted decision tree [XGBoost] to predict 2-year and 5-year symptomatic kidney stone recurrence from electronic health-record (EHR) derived features and 24H urine data (n = 1231). ML models were compared to logistic regression [LR]. A manual, retrospective review was performed to evaluate for a symptomatic stone event, defined as pain, acute kidney injury or recurrent infections attributed to a kidney stone identified in the clinic or the emergency department, or for any stone requiring surgical treatment. We evaluated performance using area under the receiver operating curve (AUC-ROC) and identified important features for each model.
Results
The 2- and 5- year symptomatic stone recurrence rates were 25% and 31%, respectively. The LASSO model performed best for symptomatic stone recurrence prediction (2-yr AUC: 0.62, 5-yr AUC: 0.63). Other models demonstrated modest overall performance at 2- and 5-years: LR (0.585, 0.618), RF (0.570, 0.608), and XGBoost (0.580, 0.621). Patient age was the only feature in the top 5 features of every model. Additionally, the LASSO model prioritized BMI and history of gout for prediction.
Conclusions
Throughout our cohorts, ML models demonstrated comparable results to that of LR, with the LASSO model outperforming all other models. Further model testing should evaluate the utility of 24H urine features in model structure.
Publisher
Research Square Platform LLC