Characterizing and Detecting Methods to be Benchmarked under Performance Unit Test-Reference-Cited by-同舟云学术

Characterizing and Detecting Methods to be Benchmarked under Performance Unit Test

Published:2022-08-20 Issue:09 Volume:32 Page:1279-1305
ISSN:0218-1940
Container-title:International Journal of Software Engineering and Knowledge Engineering
language:en
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.

Author:

Chen Jie¹^ORCID,Hu Haiyang¹,Yu Dongjin¹

Affiliation:

1. School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, P. R. China

Abstract

Continuous integration is a growing trend in the software engineering community and industry. Performance testing is becoming more important in this context. To support precise and fine-grained monitoring, performance unit tests are applied for small software components. However, the benchmarks for performance unit testing are still insufficient, which means that benchmark coverage is low and there is a room for improvement. Therefore, focusing on the most important parts of the software, such as methods, and ensuring that their performance is monitored closely with performance unit tests can greatly reduce the amount of work that needs to be done for testing and to prepare benchmarks. This paper aims to provide an assisting approach for detecting methods that need to be benchmarked in performance unit tests. We start by defining 30 features to characterize the methods in the projects and show that they can be used to tell the benchmarked methods (short for BDMs) from those that are not. Then, using the proposed features, we build machine learning-based models to detect BDMs. We perform an experiment with 10 open source projects from GitHub to see how well our approach works. First, we use seven binary classification techniques to evaluate the prediction performance of our machine learning models. We find that Random Forest makes the best predictions where AUC and MCC are between 0.77 and 0.89 and 0.5 and 0.75, respectively. In terms of cost effectiveness, the experiment reveals that by inspecting only 5% of the candidate methods detected by our model, 43% of the total real BDMs can be retrieved. Second, we conduct feature importance evaluations for individual features and feature categories. We find that eight features related to Scope, History, and Complexity are individually important for good predictions and that the combination of all features in the Scope category is paramount for our model, while the combination of features in the Control Flow category is less important. Third, we investigate the performance of our detection approach with different feature selection strategies and data sources. Our results show that we can make good predictions about whether a method needs to be benchmarked by using machine learning models. Practitioners can use our method and the results of the study to deal with BDMs detection effectively.

Funder

Medical Science and Technology Project of Zhejiang Province

Young Scientists Fund

Publisher

World Scientific Pub Co Pte Ltd

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218194022500486

Reference55 articles.

1. A Survey on Load Testing of Large-Scale Software Systems

2. Proceedings of the 15th International Conference on Mining Software Repositories

3. Experience with performance testing of software systems: issues, an approach, and case study

4. An empirical study of system design instability metric and design evolution in an agile software process

5. A large-scale empirical study of just-in-time quality assurance