Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites-Reference-Cited by-同舟云学术

Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites

Published:2015-09-02 Issue:4 Volume:24 Page:1-33
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Gligoric Milos¹,Groce Alex²,Zhang Chaoqiang²,Sharma Rohan¹,Alipour Mohammad Amin²,Marinov Darko¹

Affiliation:

1. University of Illinois at Urbana-Champaign

2. Oregon State University

Abstract

A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage . A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate . Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C ′, are C -adequate suites on average more effective than C ′-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given two criteria C and C ′, which one is better to use to compare test suites? Namely, if suites T 1 , T 2 ,…, T n have coverage values c 1 , c 2 ,…, c n for C and c 1 ′, c 2 ′,…, c n ′ for C ′, is it better to compare suites based on c 1 , c 2 ,…, c n or based on c 1 ′, c 2 ′,…, c n ′ ? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/2660767

Reference84 articles.

1. Is mutation an appropriate tool for testing experiments?

2. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria

3. A practical guide for using statistical tests to assess randomized algorithms in software engineering

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Subsumption, correctness and relative correctness: Implications for software testing;Science of Computer Programming;2025-01

2. Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation;IEEE Transactions on Software Engineering;2024-04

3. Assessing effectiveness of test suites: what do we know and what should we do?;ACM Transactions on Software Engineering and Methodology;2023-12-05

4. Heterogeneous Testing for Coverage Profilers Empowered with Debugging Support;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

5. Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts;2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE);2023-10-09