Abstract
Abstract
Background
There are no early, accurate, scalable methods for identifying infants at high risk of poor cognitive outcomes in childhood. We aim to develop an explainable predictive model, using machine learning and population-based cohort data, for this purpose.
Methods
Data were from 8858 participants in the Growing Up in Ireland cohort, a nationally representative study of infants and their primary caregivers (PCGs). Maternal, infant, and socioeconomic characteristics were collected at 9-months and cognitive ability measured at age 5 years. Data preprocessing, synthetic minority oversampling, and feature selection were performed prior to training a variety of machine learning models using ten-fold cross validated grid search to tune hyperparameters. Final models were tested on an unseen test set.
Results
A random forest (RF) model containing 15 participant-reported features in the first year of infant life, achieved an area under the receiver operating characteristic curve (AUROC) of 0.77 for predicting low cognitive ability at age 5. This model could detect 72% of infants with low cognitive ability, with a specificity of 66%.
Conclusions
Model performance would need to be improved before consideration as a population-level screening tool. However, this is a first step towards early, individual, risk stratification to allow targeted childhood screening.
Impact
This study is among the first to investigate whether machine learning methods can be used at a population-level to predict which infants are at high risk of low cognitive ability in childhood.
A random forest model using 15 features which could be easily collected in the perinatal period achieved an AUROC of 0.77 for predicting low cognitive ability.
Improved predictive performance would be required to implement this model at a population level but this may be a first step towards early, individual, risk stratification.
Publisher
Springer Science and Business Media LLC