Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning-Reference-Cited by-同舟云学术

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

Published:2024-02-28 Issue: Volume:10 Page:e1860
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ali Misbah¹,Mazhar Tehseen¹,Al-Rasheed Amal²,Shahzad Tariq³,Yasin Ghadi Yazeed⁴,Amir Khan Muhammad⁵

Affiliation:

1. Department of Computer Science & Information Technology, Virtual University of Pakistan, Lahore, Pakistan

2. Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia

3. Department of Computer Sciences, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan

4. Department of Computer Science and Software Engineering, Al Ain University, Abu Dhabi, UAE

5. School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

Abstract

Effective software defect prediction is a crucial aspect of software quality assurance, enabling the identification of defective modules before the testing phase. This study aims to propose a comprehensive five-stage framework for software defect prediction, addressing the current challenges in the field. The first stage involves selecting a cleaned version of NASA’s defect datasets, including CM1, JM1, MC2, MW1, PC1, PC3, and PC4, ensuring the data’s integrity. In the second stage, a feature selection technique based on the genetic algorithm is applied to identify the optimal subset of features. In the third stage, three heterogeneous binary classifiers, namely random forest, support vector machine, and naïve Bayes, are implemented as base classifiers. Through iterative tuning, the classifiers are optimized to achieve the highest level of accuracy individually. In the fourth stage, an ensemble machine-learning technique known as voting is applied as a master classifier, leveraging the collective decision-making power of the base classifiers. The final stage evaluates the performance of the proposed framework using five widely recognized performance evaluation measures: precision, recall, accuracy, F-measure, and area under the curve. Experimental results demonstrate that the proposed framework outperforms state-of-the-art ensemble and base classifiers employed in software defect prediction and achieves a maximum accuracy of 95.1%, showing its effectiveness in accurately identifying software defects. The framework also evaluates its efficiency by calculating execution times. Notably, it exhibits enhanced efficiency, significantly reducing the execution times during the training and testing phases by an average of 51.52% and 52.31%, respectively. This reduction contributes to a more computationally economical solution for accurate software defect prediction.

Funder

Princess Nourah bint Abdulrahman University Researchers Supporting Project number

Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1860.pdf

Reference76 articles.

1. Deep learning-based software defect prediction via semantic key features of source code—systematic survey;Abdu;Mathematics,2022

2. Software defect prediction using stacking generalization of optimized tree-based ensembles;Alazba;Applied Sciences,2022

3. Software defect prediction using variant based ensemble learning and feature selection techniques;Ali;International Journal of Modern Education and Computer Science,2020

4. Analysis of feature selection methods in software defect prediction models;Ali;IEEE Access,2023

5. Software defect prediction using tree-based ensembles;Aljamaan,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Depth linear discrimination-oriented feature selection method based on adaptive sine cosine algorithm for software defect prediction;Expert Systems with Applications;2024-11

2. Multi Self-Organizing Map (SOM) Pipeline Architecture for Multi-View Clustering;IEEE Access;2024