Mutation‐based data augmentation for software defect prediction-Reference-Cited by-同舟云学术

Mutation‐based data augmentation for software defect prediction

Published:2023-11-06 Issue: Volume: Page:
ISSN:2047-7473
Container-title:Journal of Software: Evolution and Process
language:en
Short-container-title:J Software Evolu Process

Author:

Mao Rui¹^ORCID,Zhang Li¹,Zhang Xiaofang¹^ORCID

Affiliation:

1. School of Computer Science and Technology Soochow University Suzhou China

Abstract

AbstractSoftware defect prediction (SDP) aims to distinguish between defective and nondefective instances, but the imbalance between these two classes often leads to reduced prediction performance. Conventional SDP approaches use oversampling techniques, such as synthetic oversampling, to tackle the problem of imbalanced data. However, these methods merely synthesize new instances based on traditional code features without considering actual defects at the code level. To address the issue of data imbalance while preserving semantic features of code samples, a mutation‐based data augmentation approach in SDP is proposed. The method utilizes the mutation operator to generate mutants that mutate nondefective instances and create new defective instances. Six projects from the PROMISE dataset are used to evaluate the approach, employing four traditional and two deep classifiers. The experimental results demonstrate the effectiveness of this method in improving defect prediction performance for both traditional and deep classifiers compared with other data augmentation methods.

Funder

Priority Academic Program Development of Jiangsu Higher Education Institutions

National Natural Science Foundation of China

Publisher

Wiley

Subject

Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2634

Reference65 articles.

1. Handbook of software reliability engineering;Lyu MR;Softw IEEE,1996

2. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

3. NamJ KimS.CLAMI: defect prediction on unlabeled datasets (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE);2015:452‐463.

4. Learning from imbalanced data: open challenges and future directions

5. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models;Tantithamthavorn C;IEEE Trans Softw Eng,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction;Innovations in Systems and Software Engineering;2024-06-18