Affiliation:
1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510000, P. R. China
2. School of Software Engineering, South China University of Technology, Guangzhou 510000, P. R. China
Abstract
Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.
Funder
National Nature Science Foundation of China
Guangdong Province Application Major Fund
Guangzhou Produce & Research Fund
Zhongshan Produce & Research Fund
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献