Comparing Classifier Performance to Predict Infectious Diseases

Author:

Gonzalez Roger GeertzORCID

Abstract

AbstractWe compared the accuracy of the machine learning classifier algorithms: Random Forest, Naïve Bayes, Decision Tree, and Artificial Neural Network to predict zoonoses using the Random Forest extracted features and the serology data for seven different zoonotic diseases as the targets. We identified Random Forest and Naïve Bayes as having the best performance overall. The Random Forest models above did well using Positive Predictive Value (PPV), Area Under the Curve (AOC) and Receiver Operating Characteristic (ROC) performance measures in identifying the positive cases for each of the diseases which is imperative when it comes to being able to identify the disease and then use this information to implement prevention and medical aid to specific areas and people where it is most needed. It also does well in predicting the negative values which is important to ensure the negatives are not false negatives.Naïve Bayes was found to be the best choice for accuracy and performance. NB works well because it treats each feature as independent and thus, any change in one feature will not affect the other in the NB model. Decision Tree could not capture the data and thus, underfit during the first initial modeling and after hyper tuning. Artificial Neural Network overfit the model by capturing all the data including noise in the initial model, but underfit after hyper tuning. Both Decision Tree and Artificial Neural Network classifier algorithms are not recommended as classifiers for this dataset.StatementsThere are no conflicts of interest in this work.All methods were carried out in accordance with relevant guidelines and regulations.All experimental protocols were approved by the Forestry Administration of Cambodia.Informed consent was obtained from all subjects and/or their legal guardian(s) at the beginning of the survey.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. Alam, M. Z. , Rahman, M. S. , & Rahman, M. S . (2019). A Random Forest based predictor for medical data classification using feature ranking. Informatics in Medicine Unlocked, 15. https://doi.org/10.1016/j.imu.2019.100180

2. Is the random forest algorithm suitable for predicting parkinson’s disease with mild cognitive impairment out of parkinson’s disease with normal cognition?;International Journal of Environmental Research and Public Health,2020

3. Chawla, N. v , Bowyer, K. W. , Hall, L. O. , & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. In Journal of Artificial Intelligence Research (Vol. 16).

4. Serological Evidence of Henipavirus Exposure in Cattle, Goats and Pigs in Bangladesh;PLoS Neglected Tropical Diseases,2014

5. Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients;PLoS Neglected Tropical Diseases,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3