Affiliation:
1. Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science, Peking University, China
Abstract
Large Language Models (LLMs) have shown great success in code generation. LLMs take as the input a prompt and output the code. How to make prompts (
i.e., Prompting Techniques
) is a key question. Existing prompting techniques are designed for natural language generation and have low accuracy in code generation.
In this paper, we propose a new prompting technique named
AceCoder
. Our motivation is that code generation meets two unique challenges (
i.e.
, requirement understanding and code implementation).
AceCoder
contains two novel mechanisms (
i.e.
, guided code generation and example retrieval) to solve these challenges. ➊ Guided code generation asks LLMs first to analyze requirements and output an intermediate preliminary (
e.g.
, test cases). The preliminary clarifies requirements and tells LLMs
“what to write”
. ➋ Example retrieval selects similar programs as examples in prompts, which provide lots of relevant content (
e.g.
, algorithms, APIs) and teach LLMs
“how to write”
. We apply
AceCoder
to four LLMs (
e.g.
, GPT-3.5, CodeGeeX) and evaluate it on three public benchmarks using the Pass@
k
. Results show that
AceCoder
can significantly improve the performance of LLMs on code generation.
In terms of Pass@1,
AceCoder
outperforms the state-of-the-art baseline by up to 56.4% in MBPP, 70.7% in MBJP, and 88.4% in MBJSP.
AceCoder
is effective in LLMs with different sizes (
i.e.
, 6B to 13B) and different languages (
i.e.
, Python, Java, and JavaScript). Human evaluation shows human developers prefer programs from
AceCoder
.
Publisher
Association for Computing Machinery (ACM)
Reference52 articles.
1. 2022. CodeParrot. https://huggingface.co/codeparrot/codeparrot.
2. 2022. GitHub. https://github.com/.
3. 2022. Lucene. https://lucene.apache.org/.
4. 2022. tree-sitter. https://tree-sitter.github.io/tree-sitter/.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11
2. Structured Chain-of-Thought Prompting for Code Generation;ACM Transactions on Software Engineering and Methodology;2024-08-29