Performance discrepancy mitigation in heart disease prediction for multisensory inter-datasets-Reference-Cited by-同舟云学术

Performance discrepancy mitigation in heart disease prediction for multisensory inter-datasets

Published:2024-03-18 Issue: Volume:10 Page:e1917
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Hasan Mahmudul¹²^ORCID,Sahid Md Abdus¹,Uddin Md Palash¹²,Marjan Md Abu¹,Kadry Seifedine³⁴⁵⁶,Kim Jungeun⁷

Affiliation:

1. Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh

2. School of Information Technology, Deakin University, Geelong, VIC, Australia

3. Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon

4. Department of Applied Data Science, Noroff University College, Kristiansand, Norway

5. Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, Norway

6. MEU Research Unit, Middle East University, Amman, Jordan

7. Department of Software, Kongju National University, Cheonan, Republic of South Korea

Abstract

Heart disease is one of the primary causes of morbidity and death worldwide. Millions of people have had heart attacks every year, and only early-stage predictions can help to reduce the number. Researchers are working on designing and developing early-stage prediction systems using different advanced technologies, and machine learning (ML) is one of them. Almost all existing ML-based works consider the same dataset (intra-dataset) for the training and validation of their method. In particular, they do not consider inter-dataset performance checks, where different datasets are used in the training and testing phases. In inter-dataset setup, existing ML models show a poor performance named the inter-dataset discrepancy problem. This work focuses on mitigating the inter-dataset discrepancy problem by considering five available heart disease datasets and their combined form. All potential training and testing mode combinations are systematically executed to assess discrepancies before and after applying the proposed methods. Imbalance data handling using SMOTE-Tomek, feature selection using random forest (RF), and feature extraction using principle component analysis (PCA) with a long preprocessing pipeline are used to mitigate the inter-dataset discrepancy problem. The preprocessing pipeline builds on missing value handling using RF regression, log transformation, outlier removal, normalization, and data balancing that convert the datasets to more ML-centric. Support vector machine, K-nearest neighbors, decision tree, RF, eXtreme Gradient Boosting, Gaussian naive Bayes, logistic regression, and multilayer perceptron are used as classifiers. Experimental results show that feature selection and classification using RF produce better results than other combination strategies in both single- and inter-dataset setups. In certain configurations of individual datasets, RF demonstrates 100% accuracy and 96% accuracy during the feature selection phase in an inter-dataset setup, exhibiting commendable precision, recall, F1 score, specificity, and AUC score. The results indicate that an effective preprocessing technique has the potential to improve the performance of the ML model without necessitating the development of intricate prediction models. Addressing inter-dataset discrepancies introduces a novel research avenue, enabling the amalgamation of identical features from various datasets to construct a comprehensive global dataset within a specific domain.

Funder

Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education

Technology Development Program of MSS

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1917.pdf

Reference87 articles.

1. An intelligent healthcare monitoring framework using wearable sensors and social networking data;Ali;Future Generation Computer Systems,2021

2. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion;Ali;Information Fusion,2020

3. Implementation of machine learning model to predict heart failure disease;Alotaibi;International Journal of Advanced Computer Science and Applications,2019

4. Ambient healthcare approach with hybrid whale optimization algorithm and naïve Bayes classifier;Alwateer;Sensors,2021

5. Coronary artery heart disease prediction: a comparative study of computational intelligence techniques;Ayon;IETE Journal of Research,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging textual information for social media news categorization and sentiment analysis;PLOS ONE;2024-07-15

2. Hybrid deep learning model for heart disease detection on 12-lead electrocardiograms;Procedia Computer Science;2024