Author:
Sheta Alaa,El-Ashmawi Walaa,Baareh Abdelkarim
Abstract
The advancement in treating medical data grows significantly daily. An accurate data classification model can help determine patient disease and diagnose disease severity in the medical domain, thus easing doctors' treatment burdens. Nonetheless, medical data analysis presents challenges due to uncertainty, the correlations between various measurements, and the high dimensionality of the data. These challenges burden statistical classification models. Machine Learning (ML) and data mining approaches have proven effective in recent years in gaining a deeper understanding of the importance of these aspects. This research adopts a well-known supervised learning classification model named a Decision Tree (DT). DT is a typical tree structure consisting of a central node, connected branches, and internal and terminal nodes. In each node, we have a decision to be made, such as in a rule-based system. This type of model helps researchers and physicians better diagnose a disease. To reduce the complexity of the proposed DT, we explored using the Feature Selection (FS) method to design a simpler diagnosis model with fewer factors. This concept will help reduce the data collection stage. A comparative analysis has been conducted between the developed DT and other various ML models, such as Logistic Regression (LR), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB), to demonstrate the effectiveness of the developed model. The results of the DT model establish a notable accuracy of 93.78\% and an ROC value of 0.94, which beats other compared algorithms. The developed DT model provided promising results and can help diagnose heart disease