Abstract
Background: To improve the prognosis of necrotizing enterocolitis (NEC) in newborns, early identification and timely preventive interventions play an essential role. Based on the current situation, establishing a novel and simple prediction model is of great clinical significance.
Methods: The clinical data of NEC neonates in Zhujiang Hospital of Southern Medical University from October 2010 to October 2022 were collected, and 429 non-NEC patients in the same period were selected as the control group by random sampling method. After that, all participants were randomly divided into training group (70%) and testing group (30%). Combining relevant clinical features and laboratory results, five machine learning (ML) algorithms and classical logistic regression models were established. To evaluate the performance of each model, the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, and specificity of various models were compared. 10-folds cross-validation was used to find the best hyperparameters for each model. Decision curve analysis (DCA) was further used to evaluate the performance of the established models for clinical applications, and create a column-line graph, ranking the feature importance in model by SHapely Additive exPlanation (SHAP). The column plots were calibrated using calibration curves. In addition, the established model was validated in time series analysis as well as in another medical center.
Results: Six important features were finally included for modeling, including the Day (OR=1.15; 95% CI: 1.07-1.23; P=0.001), Gestational age (OR=0.77; 95% CI: 0.62-0.95; P=0.016), Eosinophil (EOS) (OR=3.76; 95% CI: 1.76-8.02; P=0.001), Hemoglobin (HB) (OR=0.98; 95% CI: 0.97-1.00; P=0.011), Platelet distribution width (PDW) (OR=1.21; 95% CI: 1.08-1.35; P=0.001) and High-sensitivity C-reactive protein (HSCRP) (OR=1.03; 95% CI: 1.01-1.06; P=0.007). While the logistic regression model achieved an AUC of 0.919, accuracy of 0.897, sensitivity of 0.832, F1-score of 0.778, and a Brier score of 0.0878 in the training group, the AUCs for the five machine learning models ranged from 0.774 to 0.972. Among these models, the LightGBM model performed the best, with an AUC of 0.960, accuracy of 0.894, sensitivity of 0.901, F1-score of 0.813, and a Brier score of 0.072.
Conclusion: The LightGBM machine learning model can effectively identify neonatal patients at higher risk of NEC based on Day age, Gestational age, EOS, HB, PDW, and HSCRP levels. This model is useful for assisting in clinical decision-making.