Abstract
Software defect prediction refers to the automatic identification of defective parts of software through machine learning techniques. Ensemble learning has exhibited excellent prediction outcomes in comparison with individual classifiers. However, most of the previous work utilized ensemble models in the context of software defect prediction with the default hyperparameter values, which are considered suboptimal. In this paper, we investigate the applicability of a stacking ensemble built with fine-tuned tree-based ensembles for defect prediction. We used grid search to optimize the hyperparameters of seven tree-based ensembles: random forest, extra trees, AdaBoost, gradient boosting, histogram-based gradient boosting, XGBoost and CatBoost. Then, a stacking ensemble was built utilizing the fine-tuned tree-based ensembles. The ensembles were evaluated using 21 publicly available defect datasets. Empirical results showed large impacts of hyperparameter optimization on extra trees and random forest ensembles. Moreover, our results demonstrated the superiority of the stacking ensemble over all fine-tuned tree-based ensembles.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献