Affiliation:
1. School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
2. China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou 510610, China
Abstract
Software defect prediction is a popular method for optimizing software testing and improving software quality and reliability. However, software defect datasets usually have quality problems, such as class imbalance and data noise. Oversampling by generating the minority class samples is one of the most well-known methods to improving the quality of datasets; however, it often introduces overfitting noise to datasets. To better improve the quality of these datasets, this paper proposes a method called US-PONR, which uses undersampling to remove duplicate samples from version iterations and then uses oversampling through propensity score matching to reduce class imbalance and noise samples in datasets. The effectiveness of this method was validated in a software prediction experiment that involved 24 versions of software data in 11 projects from PROMISE in noisy environments that varied from 0% to 30% noise level. The experiments showed a significant improvement in the quality of datasets pre-processed by US-PONR in noisy imbalanced datasets, especially the noisiest ones, compared with 12 other advanced dataset processing methods. The experiments also demonstrated that the US-PONR method can effectively identify the label noise samples and remove them.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference61 articles.
1. Be more familiar with our enemies and pave the way forward: A review of the roles bugs played in software failures;Wong;J. Syst. Softw.,2017
2. Wong, W.E., Debroy, V., Surampudi, A., Kim, H., and Siok, M.F. (2010, January 9–11). Recent catastrophic accidents: Investigating how software was responsible. Proceedings of the SSIRI 2010—4th IEEE International Conference on Secure Software Integration and Reliability Improvement, Singapore.
3. Benchmarking Machine Learning Techniques for Software Defect Detection;Aleem;Int. J. Softw. Eng. Appl.,2015
4. Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques: A Comparative Study;Alsaeedi;J. Softw. Eng. Appl.,2019
5. A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques;Prasad;Int. J. Database Theory Appl.,2015
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献