Code-Aware Prompting: A Study of Coverage-Guided Test Generation in Regression Setting using LLM-Reference-Cited by-同舟云学术

Code-Aware Prompting: A Study of Coverage-Guided Test Generation in Regression Setting using LLM

Published:2024-07-12 Issue:FSE Volume:1 Page:951-971
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Ryan Gabriel¹^ORCID,Jain Siddhartha²^ORCID,Shang Mingyue³^ORCID,Wang Shiqi³^ORCID,Ma Xiaofei³^ORCID,Ramanathan Murali Krishna⁴^ORCID,Ray Baishakhi³^ORCID

Affiliation:

1. Columbia University, New York, USA

2. AWS AI Labs, Los Angeles, USA

3. AWS AI Labs, New York, USA

4. AWS AI Labs, San Jose, USA

Abstract

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent work using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt’s approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3643769

Reference42 articles.

1. [n. d.]. Am I in the stack? https://huggingface.co/spaces/bigcode/in-the-stack

2. Saranya Alagarsamy Chakkrit Tantithamthavorn and Aldeida Aleti. 2023. A3Test: Assertion-Augmented Automated Test Case Generation. arXiv preprint arXiv:2302.10352.

3. Suggesting accurate method and class names

4. Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry and Quoc Le. 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clover: Closed-Loop Verifiable Code Generation;Lecture Notes in Computer Science;2024