Abstract
ABSTRACTBackgroundPneumonia is the leading cause of death in children aged 1-59 months. Prediction models for child pneumonia mortality have been developed using regression methods but their performance is insufficient for clinical use.MethodsWe used a variety of machine learning methods to develop a predictive model for mortality in children with clinical pneumonia enrolled in population-based surveillance in the Basse Health and Demographic Surveillance System in rural Gambia (n=11,012). Four machine learning algorithms (support vector machine, random forest, artifical neural network, and regularized logistic regression) were implemented, fitting all possible combinations of two or more of 16 selected features. Models were shortlisted based on their training set performance, the number of included features, and the reliability of feature measurement. The final model was selected considering its clinical interpretability.ResultsWhen we applied the final model to the test set (55 deaths), the area under the Receiver Operating Characteristic Curve was 0.88 (95% confidence interval: 0.84, 0.91), sensitivity was 0.78 and specificity was 0.77.ConclusionsOur evaluation of multiple machine learning methods combined with minimal and pragmatic feature selection led to a predictive model with very good performance. We plan further validation of our model in different populations.
Publisher
Cold Spring Harbor Laboratory