Affiliation:
1. Delhi Technological University
Abstract
Abstract
Software Defect Prediction is one of the major challenges faced by software engineers across the world as software grows in size and function. It is the process of identifying error-prone modules in software before the testing phase, which helps with cost-cutting and saves time. The primary goal of this research is to compare the different data balancing techniques along with the popular classification models used for software fault prediction and optimize the best results. In this study, we have used the AEEEM dataset, along with mean value treatment and min-max scaling to pre-process data. Then dataset balancing is performed using class-weight-based, over-sampling, under-sampling, and hybridization techniques. The balanced datasets are now analyzed using 5 classification techniques: Random Forest Classifier, XGBoost, Support Vector Classifier, LightGBM, and Logistic Regression. Thus, a total of 25 combinations are accessed to find the best results using 10-fold cross-validation with f1-score and AUC as the performance metric. Further, the best methods are improved using feature selection. Finally, the best case is optimized using Optuna.
Publisher
Research Square Platform LLC
Reference18 articles.
1. Ahmed, Md. Razu and Ali, Md. Asraf and Ahmed, Nasim and Zamal, Md Fahad and Shamrat, F M (2020) The Impact of Software Fault Prediction in Real-World Application: An Automated Approach for Software Engineering. 10.1145/3379247.3379278, , 01
2. D ’Ambros, Marco and Lanza, Michele and Robbes, Romain (2012) Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empirical Software Engineering - ESE 17: 1-47 https://doi.org/10.1007/s10664-011-9173-9, 08
3. Chen, Tianqi and Guestrin, Carlos (2016) XGBoost: A Scalable Tree Boosting System. 10.1145/2939672.2939785, 785-794, 08
4. Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Curran Associates, Inc., 30, https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf, , I. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett, Advances in Neural Information Processing Systems
5. Aleem, Saiqa and Capretz, Luiz and Ahmed, Faheem (2015) Benchmarking Machine Learning Techniques for Software Defect Detection. International Journal of Software Engineering and Applications 6: 11-23 https://doi.org/10.5121/ijsea.2015.6302, 05
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献