A novel instance‐based method for cross‐project just‐in‐time defect prediction-Reference-Cited by-同舟云学术

A novel instance‐based method for cross‐project just‐in‐time defect prediction

Published:2024-01-24 Issue: Volume: Page:
ISSN:0038-0644
Container-title:Software: Practice and Experience
language:en
Short-container-title:Softw Pract Exp

Author:

Zhu Xiaoyan¹,Qiu Tian¹^ORCID,Wang Jiayin¹,Lai Xin¹

Affiliation:

1. School of Computer Science and Technology Xi'an Jiaotong University Xi'an China

Abstract

SummaryCross‐project (CP) just‐in‐time software defect prediction (JIT‐SDP) uses CP data to overcome initial data scarcity for training high‐performing JIT‐SDP classifiers in the early stages of software projects. The primary challenge faced by JIT‐SDP in a cross‐project context lies in the distinct distributions between training and test data. To tackle this issue, we select source data instances that closely resemble target data for building classifiers. Software datasets commonly exhibit a class imbalance problem, where the ratio of the defective class to the clean class is notably low. This imbalance typically diminishes classifier performance. In this study, we propose an instance selection method utilizing kernel mean matching (ISKMM) that addresses both knowledge transfer and class imbalance in cross‐project defect prediction (CPDP). The method employs the kernel mean matching (KMM) technique to assess the similarity between training and target data. It selects instances with high similarity, retains them, and resamples the data based on similarity weighting to mitigate the class imbalance problem. Our experiments, conducted on 10 open‐source projects, reveal that the ISKMM algorithm outperforms existing CP single‐source software defect prediction (SDP) algorithms. Moreover, when employing the proposed algorithm, defect predictors constructed from cross‐project data demonstrate an overall performance comparable to predictors learned from within‐project data.

Funder

National Natural Science Foundation of China

Publisher

Wiley

Subject

Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.3316

Reference47 articles.

1. The Impact of Automated Parameter Optimization on Defect Prediction Models

2. Local versus global lessons for defect prediction and effort estimation;Menzies T;IEEE Trans Softw Eng,2012

3. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

4. NamJ KimS.Heterogeneous defect prediction. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering.2015:508‐519.

5. The use of summation to aggregate software metrics hinders the performance of defect prediction models;Zhang F;IEEE Trans Softw Eng,2016