Abstract
Machine learning (ML) is the buzz all around the technology industry and is illuminating each and every sector of human lives, be it, healthcare, finance, bioinformatics, data science, mechanical engineering, agriculture or even smart cities nowadays. ML consists of supervised and unsupervised techniques. Due to the availability of data in abundance, supervised ML has been the most preferred method in the field of data mining. In this research paper, a publicly available dataset for diabetes detection is tested to understand the efficiency of classification of a number of supervised ML algorithms to find the most accurate model. The dataset consisted of data of 768 persons out of which 500 were control and 268 were patients we found that the Random Forest algorithm outperformed the other 6 classification algorithm. In the first iteration, the Random Forest algorithm reached 78.44% accuracy. The tweaks performed in the paper outclassed the original random forest algorithm with a difference of 1.08% reaching a score of 79.52%. Further, iteration I gave 171 whilst iteration II gave 173 correct predictions out of the total 218 test data.
Publisher
Universidade Estadual de Maringa