Abstract
Finding dependencies in the data requires the analysis of relations between dozens of parameters of the studied process and hundreds of possible sources of influence on this process. Dependencies are nondeterministic and therefore modeling requires the use of statistical methods for analyzing random processes. Part of the information is often hidden from observation or not monitored. That is why many difficulties have arisen in the process of analyzing the collected information. The paper aims to find frequent patterns and parameters affected by COVID-19. The novelty of the paper is hierarchical architecture comprises supervised and unsupervised methods. It allows the development of an ensemble of the methods based on k-means clustering and classification. The best classifiers from the ensemble are random forest with 500 trees and XGBoost. Classification for separated clusters gives us higher accuracy on 4% in comparison with dataset analysis. The proposed approach can be used also for personalized medicine decision support in other domains. The features selection allows us to analyze the following features with the highest impact on COVID-19: age, sex, blood group, had influenza.
Subject
Information Systems and Management,Computer Science Applications,Information Systems
Reference27 articles.
1. Coronavirus Pandemic (COVID-19). Our World in Data;Roser,2020
2. Тести На Коронавірус—в Україні Зробили Понад Мільйон Тестів ПЛР » Слово і Ділоhttps://www.slovoidilo.ua/2020/09/04/infografika/suspilstvo/pandemiya-koronavirusu-skilky-testiv-zrobyly-ukrayini-ta-inshyx-krayinax-svitu
3. Modeling and analysis of different scenarios for the spread of COVID-19 by using the modified multi-agent systems – Evidence from the selected countries
4. An approach towards missing data management using improved GRNN-SGTM ensemble method
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献