Test smells 20 years later: detectability, validity, and reliability-Reference-Cited by-同舟云学术

Test smells 20 years later: detectability, validity, and reliability

Published:2022-09-20 Issue:7 Volume:27 Page:
ISSN:1382-3256
Container-title:Empirical Software Engineering
language:en
Short-container-title:Empir Software Eng

Author:

Panichella Annibale^ORCID,Panichella Sebastiano,Fraser Gordon,Sawant Anand Ashok,Hellendoorn Vincent J.

Abstract

AbstractTest smells aim to capture design issues in test code that reduces its maintainability. These have been extensively studied and generally found quite prevalent in both human-written and automatically generated test-cases. However, most evidence of prevalence is based on specific static detection rules. Although those are based on the original, conceptual definitions of the various test smells, recent empirical studies indicate that developers perceive warnings raised by detection tools as overly strict and non-representative of the maintainability and quality of test suites. This leads us to re-assess test smell detection tools’ detection accuracy and investigate the prevalence and detectability of test smells more broadly. Specifically, we construct a hand-annotated dataset spanning hundreds of test suites both written by developers and generated by two test generation tools (EvoSuite and JTExpert) and performed a multi-stage, cross-validated manual analysis to identify the presence of six types of test smells in these. We then use this manual labeling to benchmark the performance and external validity of two test smell detection tools—one widely used in prior work and one recently introduced with the express goal to match developer perceptions of test smells. Our results primarily show that the current vocabulary of test smells is highly mismatched to real concerns: multiple smells were ubiquitous on developer-written tests but virtually never correlated with semantic or maintainability flaws; machine-generated tests actually often scored better, but in reality, suffered from a host of problems not well-captured by current test smells. Current test smell detection strategies poorly characterized the issues in these automatically generated test suites; in particular, the older tool’s detection strategies misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice, (ii) more accurate detection strategies to be evaluated primarily in industrial contexts.

Funder

H2020 European Research Council

Engineering and Physical Sciences Research Council

Publisher

Springer Science and Business Media LLC

Subject

Software

Link

https://link.springer.com/content/pdf/10.1007/s10664-022-10207-5.pdf

Reference65 articles.

1. Afshan S, McMinn P, Stevenson M (2013) Evolving readable string test inputs using a natural language model to reduce human oracle cost. In: 2013 IEEE Sixth international conference on software testing, verification and validation. IEEE, pp 352–361

2. Almasi M M, Hemmati H, Fraser G, Arcuri A, Benefelds J (2017) An industrial evaluation of unit test generation: finding real faults in a financial application. In: ICSE SEIP, pp 263–272

3. Ammann P, Offutt J (2016) Introduction to software testing. Cambridge University Press, Cambridge

4. Andrews J H, Menzies T, Li F C (2011) Genetic algorithms for randomized unit testing. IEEE Trans Softw Eng 37(1):80–94

5. Arcuri A, Fraser G (2013) Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir Softw Eng 18 (3):594–623

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Test Smells Learning by a Gamification Approach;Proceedings of the 3rd ACM International Workshop on Gamification in Software Development, Verification, and Validation;2024-09-13

2. Shaken, Not Stirred: How Developers Like Their Amplified Tests;IEEE Transactions on Software Engineering;2024-05

3. A comprehensive catalog of refactoring strategies to handle test smells in Java-based systems;Software Quality Journal;2024-03-08

4. Investigating the readability of test code;Empirical Software Engineering;2024-02-26

5. Higher Fault Detection Through Novel Density Estimators in Unit Test Generation;Lecture Notes in Computer Science;2024