The impact of feature selection techniques on effort‐aware defect prediction: An empirical study-Reference-Cited by-同舟云学术

The impact of feature selection techniques on effort‐aware defect prediction: An empirical study

Published:2023-02-05 Issue:2 Volume:17 Page:168-193
ISSN:1751-8806
Container-title:IET Software
language:en
Short-container-title:IET Software

Author:

Li Fuyang¹,Lu Wanpeng¹²,Keung Jacky Wai³,Yu Xiao¹⁴⁵^ORCID,Gong Lina⁶,Li Juan⁷

Affiliation:

1. School of Computer Science and Artificial Intelligence Wuhan University of Technology Wuhan China

2. School of Information Science and Engineering East China University of Science and Technology Shanghai China

3. Department of Computer Science City University of Hong Kong Hong Kong China

4. Sanya Science and Education Innovation Park of Wuhan University of Technology Sanya China

5. Wuhan University of Technology Chongqing Research Institute Chongqing China

6. School of Computer Science and Technology Nanjing University of Aeronautics and Astronautics Nanjing China

7. School of Computer Science and Engineering Wuhan Institute of Technology Wuhan China

Abstract

AbstractEffort‐Aware Defect Prediction (EADP) methods sort software modules based on the defect density and guide the testing team to inspect the modules with high defect density first. Previous studies indicated that some feature selection methods could improve the performance of Classification‐Based Defect Prediction (CBDP) models, and the Correlation‐based feature subset selection method with the Best First strategy (CorBF) performed the best. However, the practical benefits of feature selection methods on EADP performance are still unknown, and blindly employing the best‐performing CorBF method in CBDP to pre‐process the defect datasets may not improve the performance of EADP models but possibly result in performance degradation. To assess the impact of the feature selection techniques on EADP, a total of 24 feature selection methods with 10 classifiers embedded in a state‐of‐the‐art EADP model (CBS+) on the 41 PROMISE defect datasets were examined. We employ six evaluation metrics to assess the performance of EADP models comprehensively. The results show that (1) The impact of the feature selection methods varies in classifiers and datasets. (2) The four wrapper‐based feature subset selection methods with forwards search, that is, AdaBoost with Forwards Search, Deep Forest with Forwards Search, Random Forest with Forwards Search, and XGBoost with Forwards Search (XGBF) are better than other methods across the studied classifiers and the used datasets. And XGBF with XGBoost as the embedded classifier in CBS+ performs the best on the datasets. (3) The best‐performing CorBF method in CBDP does not perform well on the EADP task. (4) The selected features vary with different feature selection methods and different datasets, and the features noc (number of children), ic (inheritance coupling), cbo (coupling between object classes), and cbm (coupling between methods) are frequently selected by the four wrapper‐based feature subset selection methods with forwards search. (5) Using AdaBoost, deep forest, random forest, and XGBoost as the base classifiers embedded in CBS+ can achieve the best performance. In summary, we recommend the software testing team should employ XGBF with XGBoost as the embedded classifier in CBS+ to enhance the EADP performance.

Funder

NSFC

Publisher

Institution of Engineering and Technology (IET)

Subject

Computer Graphics and Computer-Aided Design

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1049/sfw2.12099

Reference72 articles.

1. A cross‐project defect prediction method based on multi‐adaptation and nuclear norm

2. TitleGen-FL: Quality prediction-based filter for automated issue title generation

3. A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts