Author:
Paul Arunya,Kar Tejaswini,Pahadsingh Sasmita,Satpathy Priya Chandan,Behera Biswaranjan
Abstract
Malignancy risks and genetic disorders have long been challenging due to procedures that lack precision and predictability, thereby complicating the precise identification of diseases and their root causes. Machine learning classifiers have emerged as more suitable and effective tools. Various machine learning classifiers have been utilized to examine different genetic disorders, and the results from these classifiers have been further compared to determine their superiority. In this study, a variety of classifiers, including the SVM, KNN, decision tree, random forest, and logistic regression algorithms, are examined. These classifiers utilize specific training variables to analyze how input values correspond to the respective class. After successfully implementing each classifier, we proceeded to employ Stacking, an ensemble machine learning technique that aggregates predictions from individual classifiers on the same dataset. Four datasets, including the breast cancer, diabetes, Parkinson’s, and genomic datasets, were successfully implemented using the aforementioned methods, and the results obtained showed how the input values correspond to the class using a few training variables. SVM classifier was shown to be the most effective of the five described classifiers, having the highest accuracy in most of the cases. It provided accuracies of 97.43%, 97.46%, 97.45%, and 97.44% for each of the genome cancer, diabetes, Parkinson’s, and breast cancer datasets. The KNN and Random Forest models also came out to be very effective, with accuracy around 95% and 91%, respectively, for various disease datasets. The Logistic Regression and Decision Tree models also worked well. However, the ensemble method of Stacking proved to be highly efficient above all other base models and generated accuracies above 97.5% for all the aforementioned diseases.
Publisher
Inventive Research Organization