Explainable Software Defects Classification Using SMOTE and Machine Learning-Reference-Cited by-同舟云学术

Explainable Software Defects Classification Using SMOTE and Machine Learning

Published:2024-01-01 Issue:1 Volume:8 Page:36-49
ISSN:2516-029X
Container-title:Annals of Emerging Technologies in Computing
language:en
Short-container-title:AETiC

Author:

Jude Agboeze,Uddin Jia^ORCID

Abstract

Software defect prediction is a critical task in software engineering that aims to identify and mitigate potential defects in software systems. In recent years, numerous techniques and approaches have been developed to improve the accuracy and efficiency of the defect prediction model. In this research paper, we proposed a comprehensive approach that addresses class imbalance by utilizing stratified splitting, explainable AI techniques, and a hybrid machine learning algorithm. To mitigate the impact of class imbalance, we employed stratified splitting during the training and evaluation phases. This method ensures that the class distribution is maintained in both the training and testing sets, enabling the model to learn from and generalize to the minority class examples effectively. Furthermore, we leveraged explainable AI methods, Lime and Shap, to enhance interpretability in the machine learning models. To improve prediction accuracy, we propose a hybrid machine learning algorithm that combines the strength of multiple models. This hybridization allows us to exploit the strength of each model, resulting in improved overall performance. The experiment is evaluated using the NASA-MD datasets. The result revealed that handling the class imbalanced data using stratify splitting approach achieves a better overall performance than the SMOTE approach in Software Defect Detection (SDD).

Publisher

International Association for Educators and Researchers (IAER)

Reference31 articles.

1. Kiran Maharana, Surajit Mondal and Bhushankumar Nemade, “A review: Data pre-processing and data augmentation techniques”, in Global Transitions Proceedings, Vol. 3, No. 1, pp. 91-99, June 2022, ISSN: 2666-285X, Published by Elsevier B.V., DOI: 10.1016/j.gltp.2022.04.020, Available: https://www.sciencedirect.com/science/article/pii/S2666285X22000565.

2. Anuradha Chug and Shafali Dhall, “Software defect prediction using supervised learning algorithm and unsupervised learning algorithm”, In Proceedings of the 4th International Conference Confluence 2013: The Next Generation Information Technology Summit, Noida, India, 26-27 September 2013, ISBN:978-1-84919-846-2, Published by IEEE Xplore, DOI: 10.1049/cp.2013.2313, Available: https://ieeexplore.ieee.org/document/6832328.

3. Zeyu Wang, Jian Liu, Yuanxin Zhang, Hongping Yuan, Ruixue Zhang et al., “Practical issues in implementing machine-learning models for building energy efficiency: Moving beyond obstacles”, Renewable and Sustainable Energy Reviews, ISSN: 1364-0321, pp. 110929, Vol. 143, June 2021, Published by Elsevier BV, DOI: 10.1016/j.rser.2021.110929, Available: http://www.sciencedirect.com/science/article/pii/S1364032121002227.

4. Romi S. Wahono and Nanna Suryana, “Combining particle swarm optimization-based feature selection and bagging technique for software defect prediction”, International Journal of Software Engineering and Its Applications, ISSN: 1738-9984, Vol. 7, No. 5, pp. 153-166, 2013, DOI: 10.14257/ijseia.2013.7.5.16, Available: https://digital-library.theiet.org/content/conferences/10.1049/cp.2013.2293.

5. Tim Menzies, Jeremy Greenwald and Art Frank, “Data mining static code attributes to learn defect predictors”, IEEE transactions on Software Engineering, ISSN: 0098-5589, Vol. 33, No. 1, pp. 2-13, 2006, DOI: 10.1109/TSE.2007.256941, Available: https://ieeexplore.ieee.org/abstract/document/4027145.