Impact Evaluation of Significant Feature Set in Cross Project for Defect Prediction through Hybrid Feature Selection in Multiclass

Author:

Gul Sana,Faiz Rizwan Bin,Aljaidi MohammadORCID,Samara GhassanORCID,Alsarhan Ayoub,al-Qerem Ahmad

Abstract

AbstractCross-project defect prediction (CPDP) is a significant way of defect identification in the project. In cross-project defect prediction, we extract knowledge from the source project and apply that learned knowledge to predict labels for the target project. However, the model performance can be affected by features that are insignificant and irrelevant. Hybrid feature selection (HFS) can play a significant role in achieving high prediction accuracy by selecting significant and only relevant features. Our aim is to explore effect of significant feature selection through a hybrid approach upon cross-project (CP) defect prediction for datasets which are multi-class in nature. We leveraged the strengths of Random Forest (RF) and Recursive Feature Elimination Cross Validation (RFECV) which can constructively select few features which are significant. The design of our controlled experiment is 1 Factor 2 Treatments (1F2T). Exploratory Data Analysis (EDA) proves that all versions of PROMISE repository are multi class and have duplicated rows in data, distribution gap among values, and imbalance classes. Hence after removing duplicated rows, reducing the gap present in distribution of data, and balancing classes, we selected significant feature set through Hybrid approach i.e. Random Forest (RF) and Recursive Feature Elimination Cross Validation (RFECV). We used Convolutional Neural Network (CNN) as a classifier to predict Cross project defects along with SoftMax as the last layer. Our experimental setup resulted in the average 78% prediction accuracy measure of all 14 versions in terms of AUC. Our experimental result showed that there is significant impact of HFS on defect prediction accuracy for different datasets in the CP.

Publisher

Cold Spring Harbor Laboratory

Reference32 articles.

1. A Systematic Literature Review on Fault Prediction Performance in Software Engineering;IEEE Transactions on Software Engineering,2012

2. Systematic literature review causes of rework in GSD;Int. Arab J. Inf. Technol,2022

3. Deep Semantic Feature Learning for Software Defect Prediction;IEEE Transactions on Software Engineering,2020

4. Defect prediction from static code features: current results, limitations, new approaches;Automated Software Engineering,2010

5. Empirical analysis of network measures for effort-aware fault-proneness prediction;Information and Software Technology,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3