Author:
Suryawanshi,Amol Kadam Ranjeetsingh,
Abstract
Imagine you are trying to classify software defect for a large dataset. How will you choose the best algorithm to do that? For the above problem we have various algorithms like Random Forest, Support Vector Machine, Neural Networks, Naive Bayes, K-Nearest Neighbours, Decision Tree, Logistic Regression etc. One of the most used methods is Random Forest algorithm, which uses multiple Decision Trees to make predictions. However, this algorithm relies on a complex calculation called Entropy, which measures the uncertainty in the data. Entropy is a function that uses natural logarithm which may be time consuming calculation. Is there a better way to calculate entropy? In this research, we have explored a different way to calculate the natural logarithm using the Taylor series expression. It is a series consisting of sum of infinite terms that approximates any function by using its derivatives. We further modified the Random Forest algorithm by replacing the natural logarithm with the Taylor series expression in the Entropy formula. We tested our modified algorithm on dataset and compared its performance with the original Entropy formula. We found that our modification in the algorithm has improved the accuracy of the algorithm on software defect prediction