Affiliation:
1. K.N.Toosi University of Technology
2. Polytechnique Montréal
Abstract
Abstract
Model instability refers to where a machine learning model trained on historical data becomes less reliable over time due to Concept Drift (CD). CD refers to the phenomenon where the underlying data distribution changes over time. In this paper, we proposed a method for predicting CD in evolving software through the identification of inconsistencies in the instance interpretation over time for the first time. To this end, we obtained the instance interpretation vector for each newly created commit sample by developers over time. Wherever there is a significant difference in statistical distribution between the interpreted sample and previously ones, it is identified as CD. To evaluate our proposed method, we have conducted a comparison of the method's results with those of the baseline method. The baseline method locates CD points by monitoring the Error Rate (ER) over time. In the baseline method, CD is identified whenever there is a significant rise in the ER. In order to extend the evaluation of the proposed method, we have obtained the CD points by the baseline method based on monitoring additional efficiency measures over time besides the ER. Furthermore, this paper presents an experimental study to investigate the discovery of CD over time using the proposed method by taking into account resampled datasets for the first time. The results of our study conducted on 20 known datasets indicated that the model's instability over time can be predicted with a high degree of accuracy without requiring the labeling of newly entered data.
Publisher
Research Square Platform LLC
Reference63 articles.
1. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models;Tantithamthavorn C;IEEE Trans Software Eng,2020
2. Vreš D, Robnik-Šikonja M (2022) Preventing deception with explanation methods using focused sampling. Data Mining and Knowledge Discovery, : p. 1–46
3. McIntosh S, Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. in Proceedings of the 40th International Conference on Software Engineering
4. The impact of data merging on the interpretation of cross-project just-in-time defect models;Lin D;IEEE Trans Software Eng,2021
5. The impact of feature importance methods on the interpretation of defect classifiers;Rajbahadur GK;IEEE Trans Software Eng,2021