Evaluating classifiers in SE research: the ECSER pipeline and two replication studies-Reference-Cited by-同舟云学术

Evaluating classifiers in SE research: the ECSER pipeline and two replication studies

Published:2022-11-08 Issue:1 Volume:28 Page:
ISSN:1382-3256
Container-title:Empirical Software Engineering
language:en
Short-container-title:Empir Software Eng

Author:

Dell’Anna Davide^ORCID,Aydemir Fatma Başak,Dalpiaz Fabiano

Abstract

Abstract Context Automated classifiers, often based on machine learning (ML), are increasingly used in software engineering (SE) for labelling previously unseen SE data. Researchers have proposed automated classifiers that predict if a code chunk is a clone, if a requirement is functional or non-functional, if the outcome of a test case is non-deterministic, etc. Objective The lack of guidelines for applying and reporting classification techniques for SE research leads to studies in which important research steps may be skipped, key findings might not be identified and shared, and the readers may find reported results (e.g., precision or recall above 90%) that are not a credible representation of the performance in operational contexts. The goal of this paper is to advance ML4SE research by proposing rigorous ways of conducting and reporting research. Results We introduce the ECSER (Evaluating Classifiers in Software Engineering Research) pipeline, which includes a series of steps for conducting and evaluating automated classification research in SE. Then, we conduct two replication studies where we apply ECSER to recent research in requirements engineering and in software testing. Conclusions In addition to demonstrating the applicability of the pipeline, the replication studies demonstrate ECSER’s usefulness: not only do we confirm and strengthen some findings identified by the original authors, but we also discover additional ones. Some of these findings contradict the original ones.

Funder

Türkiye Bilimsel ve Teknolojik Araştirma Kurumu

Publisher

Springer Science and Business Media LLC

Subject

Software

Link

https://link.springer.com/content/pdf/10.1007/s10664-022-10243-1.pdf

Reference85 articles.

1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) Tensorflow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. Software available from tensorflow.org

2. Adams N M, Hand D J (2000) Improving the practice of classifier performance assessment. Neural Comput 12(2):305–311

3. Agrawal A, Menzies T (2018) Is “better data” better than “better data miners”?. In: IEEE/ACM international conference on software engineering, pp 1050–1061

4. Agrawal A, Yang X, Agrawal R, Yedida R, Shen X, Menzies T (2021) Simpler hyperparameter optimization for software analytics: why, how, when. IEEE Trans Softw Eng 48:2939–2954

5. Alonso-Betanzos A, Bolón-Canedo V, Heyndrickx G R, Kerkhof P L (2015) Exploring guidelines for classification of major heart failure subtypes by using machine learning. Clin Med Insights: Cardiol 9:CMC–s18746

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From Specifications to Prompts: On the Future of Generative Large Language Models in Requirements Engineering;IEEE Software;2024-09

2. Requirements Classification for Traceability Link Recovery;2024 IEEE 32nd International Requirements Engineering Conference (RE);2024-06-24

3. Deriving Domain Models From User Stories: Human vs. Machines;2024 IEEE 32nd International Requirements Engineering Conference (RE);2024-06-24

4. 230,439 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers;2024 IEEE Conference on Software Testing, Verification and Validation (ICST);2024-05-27

5. Enhancing Software Requirements Classification with Semisupervised GAN‐BERT Technique;Journal of Electrical and Computer Engineering;2024-01