Affiliation:
1. School of Computer Science and Technology Xi'an Jiaotong University Xi'an China
Abstract
SummaryCross‐project (CP) just‐in‐time software defect prediction (JIT‐SDP) uses CP data to overcome initial data scarcity for training high‐performing JIT‐SDP classifiers in the early stages of software projects. The primary challenge faced by JIT‐SDP in a cross‐project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross‐project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open‐source projects, reveal that the ISKMM algorithm outperforms existing CP single‐source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross‐project data demonstrate an overall performance comparable to predictors learned from within‐project data.
Funder
National Natural Science Foundation of China