Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites

Author:

Gligoric Milos1,Groce Alex2,Zhang Chaoqiang2,Sharma Rohan1,Alipour Mohammad Amin2,Marinov Darko1

Affiliation:

1. University of Illinois at Urbana-Champaign

2. Oregon State University

Abstract

A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage . A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate . Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C ′, are C -adequate suites on average more effective than C ′-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given two criteria C and C ′, which one is better to use to compare test suites? Namely, if suites T 1 , T 2 ,…, T n have coverage values c 1 , c 2 ,…, c n for C and c 1 ′, c 2 ′,…, c n ′ for C ′, is it better to compare suites based on c 1 , c 2 ,…, c n or based on c 1 ′, c 2 ′,…, c n ? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Cited by 38 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Subsumption, correctness and relative correctness: Implications for software testing;Science of Computer Programming;2025-01

2. Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation;IEEE Transactions on Software Engineering;2024-04

3. Assessing effectiveness of test suites: what do we know and what should we do?;ACM Transactions on Software Engineering and Methodology;2023-12-05

4. Heterogeneous Testing for Coverage Profilers Empowered with Debugging Support;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

5. Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts;2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE);2023-10-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3