Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction-Reference-Cited by-同舟云学术

Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction

Published:2023-12-11 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Guo Shikai¹,Li Dongmin¹,Huang Lin¹,Lv Sijia¹,Chen Rong¹,Li Hui¹,Li Xiaochen²,Jiang He²

Affiliation:

1. Dalian Maritime University, China

2. Dalian University of Technology, China

Abstract

The aim of Just-In-Time (JIT) defect prediction is to predict software changes that are prone to defects in a project in a timely manner, thereby improving the efficiency of software development and ensuring software quality. Identifying changes that introduce bugs is a critical task in just-in-time defect prediction, and researchers have introduced the SZZ approach and its variants to label these changes. However, it has been shown that different SZZ algorithms introduce noise to the dataset to a certain extent, which may reduce the predictive performance of the model. To address this limitation, we propose the Confident Learning Imbalance (CLI) model. The model identifies and excludes samples whose labels may be corrupted by estimating the joint distribution of noisy labels and true labels, and mitigates the impact of noisy data on the performance of the prediction model. The CLI consists of two components: identifying noisy data (Confident Learning Component) and generating a predicted probability matrix for imbalanced data (Imbalanced Data Probabilistic Prediction Component). The IDPP component generates precise predicted probabilities for each instance in the training set, while the CL component uses the generated predicted probability matrix and noise labels to clean up the noise and build a classification model. We evaluate the performance of our model through extensive experiments on a total of 126,526 changes from ten Apache open source projects, and the results show that our model outperforms the baseline methods.

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/3637226

Reference44 articles.

1. 2021. CLI details. https://github.com/Andyldm/CLI/ 2021. CLI details. https://github.com/Andyldm/CLI/

2. Learning from noisy examples

3. Yan M Xia X Cai L, Fan YR. 2019. Just-in-time software defect prediction: Literature review.Ruan Jian Xue Bao/Journal of Software 30, 5 ( 2019 ), 1288–1307. http://www.jos.org.cn/1000-9825/5713.htm Yan M Xia X Cai L, Fan YR. 2019. Just-in-time software defect prediction: Literature review.Ruan Jian Xue Bao/Journal of Software 30, 5 (2019), 1288–1307. http://www.jos.org.cn/1000-9825/5713.htm

4. A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes

5. Evaluating defect prediction approaches: a benchmark and an extensive comparison

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Empirical Study on Just-in-time Conformal Defect Prediction;Proceedings of the 21st International Conference on Mining Software Repositories;2024-04-15