Explainable Software Defects Classification Using SMOTE and Machine Learning

Author:

Jude Agboeze,Uddin JiaORCID

Abstract

Software defect prediction is a critical task in software engineering that aims to identify and mitigate potential defects in software systems. In recent years, numerous techniques and approaches have been developed to improve the accuracy and efficiency of the defect prediction model. In this research paper, we proposed a comprehensive approach that addresses class imbalance by utilizing stratified splitting, explainable AI techniques, and a hybrid machine learning algorithm. To mitigate the impact of class imbalance, we employed stratified splitting during the training and evaluation phases. This method ensures that the class distribution is maintained in both the training and testing sets, enabling the model to learn from and generalize to the minority class examples effectively. Furthermore, we leveraged explainable AI methods, Lime and Shap, to enhance interpretability in the machine learning models. To improve prediction accuracy, we propose a hybrid machine learning algorithm that combines the strength of multiple models. This hybridization allows us to exploit the strength of each model, resulting in improved overall performance. The experiment is evaluated using the NASA-MD datasets. The result revealed that handling the class imbalanced data using stratify splitting approach achieves a better overall performance than the SMOTE approach in Software Defect Detection (SDD).

Publisher

International Association for Educators and Researchers (IAER)

Reference31 articles.

1. Kiran Maharana, Surajit Mondal and Bhushankumar Nemade, “A review: Data pre-processing and data augmentation techniques”, in Global Transitions Proceedings, Vol. 3, No. 1, pp. 91-99, June 2022, ISSN: 2666-285X, Published by Elsevier B.V., DOI: 10.1016/j.gltp.2022.04.020, Available: https://www.sciencedirect.com/science/article/pii/S2666285X22000565.

2. Anuradha Chug and Shafali Dhall, “Software defect prediction using supervised learning algorithm and unsupervised learning algorithm”, In Proceedings of the 4th International Conference Confluence 2013: The Next Generation Information Technology Summit, Noida, India, 26-27 September 2013, ISBN:978-1-84919-846-2, Published by IEEE Xplore, DOI: 10.1049/cp.2013.2313, Available: https://ieeexplore.ieee.org/document/6832328.

3. Zeyu Wang, Jian Liu, Yuanxin Zhang, Hongping Yuan, Ruixue Zhang et al., “Practical issues in implementing machine-learning models for building energy efficiency: Moving beyond obstacles”, Renewable and Sustainable Energy Reviews, ISSN: 1364-0321, pp. 110929, Vol. 143, June 2021, Published by Elsevier BV, DOI: 10.1016/j.rser.2021.110929, Available: http://www.sciencedirect.com/science/article/pii/S1364032121002227.

4. Romi S. Wahono and Nanna Suryana, “Combining particle swarm optimization-based feature selection and bagging technique for software defect prediction”, International Journal of Software Engineering and Its Applications, ISSN: 1738-9984, Vol. 7, No. 5, pp. 153-166, 2013, DOI: 10.14257/ijseia.2013.7.5.16, Available: https://digital-library.theiet.org/content/conferences/10.1049/cp.2013.2293.

5. Tim Menzies, Jeremy Greenwald and Art Frank, “Data mining static code attributes to learn defect predictors”, IEEE transactions on Software Engineering, ISSN: 0098-5589, Vol. 33, No. 1, pp. 2-13, 2006, DOI: 10.1109/TSE.2007.256941, Available: https://ieeexplore.ieee.org/abstract/document/4027145.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3