An Overview of Supervised Machine Learning Methods and Data Analysis for COVID-19 Detection

Author:

Tchagna Kouanou Aurelle12ORCID,Mih Attia Thomas1,Feudjio Cyrille3,Djeumo Anges Fleurio2,Ngo Mouelas Adèle24,Nzogang Mendel Patrice5,Tchito Tchapga Christian1,Tchiotsop Daniel6

Affiliation:

1. Department of Computer Engineering, College of Technology, University of Buea, Buea, Cameroon

2. Department of Training, Research Development and Innovation, InchTech’s Solutions, Yaoundé, Cameroon

3. Department of Electrical and Electronic Engineering, College of Technology, University of Buea, Buea, Cameroon

4. Ecole Nationale Supérieur Polytechnique, University of Yaounde 1, Yaoundé, Cameroon

5. Faculté de Médecine et des Sciences Biomédicales, University of Yaounde 1, Yaoundé, Cameroon

6. Unité de Recherche d’Automatique et d’informatique Appliquée (UR-AIA), IUT-FV de Bandjoun, Université de Dschang-Cameroun, BP 134, Bandjoun, Cameroon

Abstract

Background and Objective. To mitigate the spread of the virus responsible for COVID-19, known as SARS-CoV-2, there is an urgent need for massive population testing. Due to the constant shortage of PCR (polymerase chain reaction) test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus. However, these tests have a high rate of false positives (positive but actually negative test results) and false negatives (negative but actually positive test results) and are therefore not always reliable. In this paper, we proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections. Methods. Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results. SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion. The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.

Publisher

Hindawi Limited

Subject

Health Informatics,Biomedical Engineering,Surgery,Biotechnology

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3