Detecting Concept Drift in Just-In-Time Software Defect Prediction Using Model Interpretation

Author:

Chitsazian Zeynab1,Kashi Saeed Sedighian1

Affiliation:

1. K.N. Toosi University of Technology

Abstract

Abstract Context: Previous studies have indicated that the stability of Just-In-Time Software Defect Prediction (JIT-SDP) models can change over time due to various factors, including modifications in code, environment, and other variables. This phenomenon is commonly referred to as Concept Drift (CD), which can lead to a decline in model performance over time. As a result, it is essential to monitor the model performance and data distribution over time to identify any fluctuations. Objective: We aim to identify CD points on unlabeled input data in order to address performance instability issues in evolving software and investigate the compatibility of these proposed methods with methods based on labeled input data. To accomplish this, we considered the chronological order of the input commits generated by developers over time. In this study, we propose several methods that monitor the distance between model interpretation vectors and values of their individual features over time to identify significant distances for detecting CD points. We compared these methods with various baseline methods. Method: In this study, we utilized a publicly available dataset that has been developed over the long-term and comprises 20 open-source projects. Given the real-world scenarios, we also considered verification latency. Our initial idea involved identifying CD points on within-project by discovering significant distances between consecutive vectors of interpretation of incremental and non-incremental models. Results: We compared the performance of the proposed CD Detection (CDD) methods to various baseline methods that utilized incremental Naïve Bayes classification. These baseline methods are based on monitoring the error rate of various performance measures. We evaluated the proposed approaches using well-known measures of CDD methods such as accuracy, missed detection rate, mean time to detection, mean time between false alarms, and meantime ratio. Our evaluation was conducted using the Friedman statistical test. Conclusions: According to the results obtained, it appears that method based on the average interpretation vector does not accurately recognize CD. Additionally, methods that rely on incremental classifiers have the lowest accuracy. On the other hand, methods based on non-incremental learning that utilized interpretation with positive effect size demonstrate the highest accuracy. By employing strategies that utilized the interpretation values of each feature, we were able to derive features that have the most positive effect in identifying CD.

Publisher

Research Square Platform LLC

Reference26 articles.

1. Bifet, A., & Gavalda, R. (2007). Learning from time-changing data with adaptive windowing. Proceedings of the 2007 SIAM international conference on data mining, SIAM.

2. Bifet, A., Read, J., Pfahringer, B., Holmes, G., & Žliobaitė, I. (2013). CD-MOA: Change detection framework for massive online analysis. International Symposium on Intelligent Data Analysis, Springer.

3. Towards reliable online just-in-time software defect prediction;Cabral GG;IEEE Transactions on Software Engineering,2022

4. Cabral, G. G., Minku, L. L., Shihab, E., & Mujahid, S. (2019). Class imbalance evolution and verification latency in just-in-time software defect prediction. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE.

5. MULTI: Multi-objective effort-aware just-in-time software defect prediction;Chen X;Information and Software Technology,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3