Predicting unstable software benchmarks using static source code features-Reference-Cited by-同舟云学术

Predicting unstable software benchmarks using static source code features

Published:2021-08-18 Issue:6 Volume:26 Page:
ISSN:1382-3256
Container-title:Empirical Software Engineering
language:en
Short-container-title:Empir Software Eng

Author:

Laaber Christoph^ORCID,Basmaci Mikael,Salza Pasquale^ORCID

Abstract

AbstractSoftware benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.

Funder

Universität Zürich

Publisher

Springer Science and Business Media LLC

Subject

Software

Link

https://link.springer.com/content/pdf/10.1007/s10664-021-09996-y.pdf

Reference135 articles.

1. Abedi A, Brecht T (2017) Conducting repeatable experiments in highly variable cloud computing environments. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ICPE, vol 2017. ACM, New York, pp 287–292. https://doi.org/10.1145/3030207.3030229

2. Akinshin A (2020a) Quantile confidence intervals for weighted samples. https://aakinshin.net/posts/weighted-quantiles-ci/, accessed: 2.2. 2021

3. Akinshin A (2020b) Quantile-respectful density estimation based on the Harrell-Davis, quantile estimator. https://aakinshin.net/posts/qrde-hd/, accessed: 2.2. 2021

4. Akinshin A (2021) Unbiased median absolute deviation. https://aakinshin.net/posts/unbiased-mad/, accessed: 9.2.2021

5. Alam MMu, Liu T, Zeng G, Muzahid A (2017) SyncPerf: Categorizing, detecting, and diagnosing synchronization performance bugs. In: Proceedings of the 12th European Conference on Computer Systems, EuroSys. ACM, New York, pp 298–313. https://doi.org/10.1145/3064176.3064186

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating Search-Based Software Microbenchmark Prioritization;IEEE Transactions on Software Engineering;2024-07

2. Enhancing Performance Bug Prediction Using Performance Code Metrics;Proceedings of the 21st International Conference on Mining Software Repositories;2024-04-15

3. A longitudinal study on the temporal validity of software samples;Information and Software Technology;2024-04

4. What makes a real change in software performance? An empirical study on analyzing the factors that affect the triagement of performance change points;Science of Computer Programming;2024-03

5. Studying the association between Gitcoin’s issues and resolving outcomes;Journal of Systems and Software;2023-12