Affiliation:
1. Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam 13120, South Korea
2. Department of Computer Engineering, Chungbuk National University, Cheongju 28644, South Korea
3. Division of Computer Engineering, College of IT Engineering, Hansung University, Seoul 02876, South Korea
Abstract
<abstract>
<p>The incidence of hypertension has increased dramatically in both elderly and young populations. The incidence of hypertension also increased with the outbreak of the COVID-19 pandemic. To enhance hypertension detection accuracy, we proposed a multivariate outlier removal method based on the deep autoencoder (DAE) technique. The method was applied to the Korean National Health and Nutrition Examination Survey (KNHANES) database. Several studies have identified various risk factors for chronic hypertension. Chronic diseases are often multifactorial rather than isolated and have been associated with COVID-19. Therefore, it is necessary to study disease detection by considering complex factors. This study was divided into two main parts. The first module, data preprocessing, integrated external features for COVID-19 patients merged by region, age, and gender for the KHNANE-2020 and Kaggle datasets. We then performed multicollinearity (MC)-based feature selection for the KNHANES and integrated datasets. Notably, our MC analysis revealed that the "COVID-19 statement" feature, with a variance inflation factor (VIF) of 1.023 and a p-value < 0.01, is significant in predicting hypertension, underscoring the interrelation between COVID-19 and hypertension risk. The next module used a predictive analysis step to detect and predict hypertension based on an ordinal encoder (OE) transformation and multivariate outlier removal using a DAE from the KNHANES data. We compared each classification model's accuracy, F1 score, and area under the curve (AUC). The experimental results showed that the proposed XGBoost model achieved the best results, with an accuracy rate of 87.78% (86.49%–88.1%, 95% CI), an F1 score of 89.95%, and an AUC of 92.28% for the COVID-19 cases, and an accuracy rate of 87.72% (85.86%–89.69%, 95% CI), an F1 score of 89.94%, and an AUC of 92.23% for the non-COVID-19 cases with the DAE_OE model. We improved the prediction performance of the classifiers used in all experiments by developing a high-quality training dataset implementing the DAE and OE in our method. Moreover, we experimentally demonstrated how the steps of the proposed method improved performance. Our approach has potential applications beyond hypertension detection, including other diseases such as stroke and cardiovascular disease.</p>
</abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Reference36 articles.
1. Korea Centers for Disease Control & Prevention. http://knhanes.cdc.go.kr. Accessed: February 4, 2014.
2. C. Wang, P. W. Horby, F. G. Hayden, G. F. Gao, A novel coronavirus outbreak of global health concern, Lancet, 395 (2020), 470–473. https://doi.org/10.1016/S0140-6736(20)30185-9
3. World Health Organization, https://www.who.int/health-topics/hypertension/#tab = tab_1
4. D. Khongorzul, M. H. Kim, Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction, Neural Process. Lett., (2021), 1–13.
5. B. Liao, X. Jia, T. Zhang, R. Sun, DHDIP: An interpretable model for hypertension and hyperlipidemia prediction based on EMR data, Comput. Methods Programs Biomed., 226 (2022), 107088. https://doi.org/10.1016/j.cmpb.2022.107088
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献