COVID-19 From Symptoms to Prediction: A Statistical and Machine Learning Approach

Author:

Fakieh Bahjat1,Saleem Farrukh1

Affiliation:

1. King Abdulaziz University

Abstract

Abstract During the COVID-19 pandemic, analysis of patients’ data played a vital role in developing precautions, medications, and vaccination strategies. In this regard, data reported by hospitals and medical institutes is considered one of the reliable sources for any investigation. The use of recent technologies such as machine learning provides a platform to transform such data into meaningful insight that can help decision-makers to prepare future strategies. This study mainly focused on developing prediction models to predict the age group of COVID-19 patients using different attributes by applying statistical and Machine Learning (ML) approaches. The study was conducted in two different phases. Firstly, statistical tests such as ANOVA and t-test were applied to investigate relationships between different variables. Secondly, multiple ML models were applied to predict patients’ age groups based on symptom data. For this, Decision Tree, Naïve Bayes, KNN, Gradient Boosted Trees, and Random Forest models were trained for prediction. In addition, to enrich the performance of the prediction model bagging, boosting, and stacking ensemble approaches are used. The statistical results clearly suggested a significant association among five common symptoms in datasets. Moreover, the results of ML implementation indicated that ensemble approaches such as boosting, bagging, and stacking can enhance the prediction accuracy significantly. Overall, the gradient boosting trees (GBT) with bagging marginally outperformed (0.6628), but in some cases boosting and stacking proved strong techniques and can generate better ensembles than bagging. For example, K-Nearest Neighbors (KNN) and Naïve Bayes (NB) without ensemble recorded (0.529) and (0.554) accuracy, but with the stacking model the performance of both was enhanced significantly and accuracy measured at 0.63 and 0.622 respectively. This study shows that ML ensemble approaches can enrich the performance of the prediction model. The results of this study can be useful for medical authorities to prepare and implement strategies and precaution guidelines for different age groups and recorded symptoms.

Publisher

Research Square Platform LLC

Reference78 articles.

1. Outbreak of a novel coronavirus;Du Toit A;Nat. Rev. Microbiol.,2020

2. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China;Novel CP;Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi,2020

3. WHO Available online: https://covid19.who.int/ (accessed on Nov 26, 2021).

4. de Lara-Tuprio, E.; Estadilla, C.D.S.; Macalalag, J.M.R.; Teng, T.R.; Uyheng, J.; Espina, K.E.; Pulmano, C.E.; Estuar, M.R.J.E.; Sarmiento, R.F.R. Policy-driven mathematical modelling for COVID-19 pandemic response in the Philippines. Epidemics 2022, 100599.

5. The impact of Covid-19 on higher education around the world;Marinoni G;IAU Glob. Surv. Rep,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3