Under-five mortality is one of the major public health issues and directly influences the population’s health, social development, and economic status of countries. Thus, early detection is essential to find what efficient prevention can take to save it. Therefore, this study will explain how machine-learning techniques can help predict the important determinants of under-five mortality in India.
This study used data from the National Family Health Survey-V of India. We performed the tenfold cross-validation to assess the model’s capability in the dataset. The decision tree, random forest, logistic regression, neural network, ridge regression, k-nearest neighbor, and naive Bayes models were used in under-five mortality data, and metrics like confusion matrix, accuracy, recall, precision, F1-score, Cohen kappa, and the area under receiver operative characteristics (AUROC) were used to assess the predictive power of the models. The chi-square scores, recursive feature elimination, extra tree classifier, random forest importance, sequential feature selector, and traditional logistic regression were used to predict the important features(factors) of under-five mortality. All computational algorithms were done with the help of SPSS-27 and Jupiter notebook (inbuilt Python 3.3) software.
The result reveals that the random forest model was the best predictive model compared to other ML models for under-five mortality. The Random Forest model’s precision was estimated to be 98.88% for all factors and 96.25% for important selected variables. After that, neural network accuracy was 96.52%, and accuracy was 94.83% with important variables. Traditional logistic regression accuracy was 93.99% and 93.51%, respectively. The number of living children, breastfeeding status, birth in the last five years, children ever born, time, antenatal care, region, size of children, number of household members, and birth order, were important factors of under-five mortality after using the feature selection methods.
This is the first study of India to use machine learning approaches to find the important ML predictive model and determine the causative factors for under-five mortality. The random forest model predicted the most important factors with the highest accuracy of under-five mortality. This machine-learning approach can be used as reference concepts to understand students, non-computing professionals, healthcare professionals, and decision-makers in various real-world situations and application areas, particularly from the technical point of view.