Abstract
AbstractWe compared the accuracy of the machine learning classifier algorithms: Random Forest, Naïve Bayes, Decision Tree, and Artificial Neural Network to predict zoonoses using the Random Forest extracted features and the serology data for seven different zoonotic diseases as the targets. We identified Random Forest and Naïve Bayes as having the best performance overall. The Random Forest models above did well using Positive Predictive Value (PPV), Area Under the Curve (AOC) and Receiver Operating Characteristic (ROC) performance measures in identifying the positive cases for each of the diseases which is imperative when it comes to being able to identify the disease and then use this information to implement prevention and medical aid to specific areas and people where it is most needed. It also does well in predicting the negative values which is important to ensure the negatives are not false negatives.Naïve Bayes was found to be the best choice for accuracy and performance. NB works well because it treats each feature as independent and thus, any change in one feature will not affect the other in the NB model. Decision Tree could not capture the data and thus, underfit during the first initial modeling and after hyper tuning. Artificial Neural Network overfit the model by capturing all the data including noise in the initial model, but underfit after hyper tuning. Both Decision Tree and Artificial Neural Network classifier algorithms are not recommended as classifiers for this dataset.StatementsThere are no conflicts of interest in this work.All methods were carried out in accordance with relevant guidelines and regulations.All experimental protocols were approved by the Forestry Administration of Cambodia.Informed consent was obtained from all subjects and/or their legal guardian(s) at the beginning of the survey.
Publisher
Cold Spring Harbor Laboratory
Reference30 articles.
1. Alam, M. Z. , Rahman, M. S. , & Rahman, M. S . (2019). A Random Forest based predictor for medical data classification using feature ranking. Informatics in Medicine Unlocked, 15. https://doi.org/10.1016/j.imu.2019.100180
2. Is the random forest algorithm suitable for predicting parkinson’s disease with mild cognitive impairment out of parkinson’s disease with normal cognition?;International Journal of Environmental Research and Public Health,2020
3. Chawla, N. v , Bowyer, K. W. , Hall, L. O. , & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. In Journal of Artificial Intelligence Research (Vol. 16).
4. Serological Evidence of Henipavirus Exposure in Cattle, Goats and Pigs in Bangladesh;PLoS Neglected Tropical Diseases,2014
5. Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients;PLoS Neglected Tropical Diseases,2016