ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification-Reference-Cited by-同舟云学术

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

Published:2024-07-12 Issue:FSE Volume:1 Page:2332-2354
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Mu Fangwen¹^ORCID,Shi Lin²^ORCID,Wang Song³^ORCID,Yu Zhuohao¹^ORCID,Zhang Binquan²^ORCID,Wang ChenXue⁴^ORCID,Liu Shichao⁵^ORCID,Wang Qing¹^ORCID

Affiliation:

1. Institute of Software at Chinese Academy of Sciences, Beijing, China / University of Chinese Academy of Sciences, Beijing, China

2. Beihang University, Beijing, China

3. York University, Toronto, Canada

4. Institute of Software at Chinese Academy of Sciences, Beijing, China / Harbin Institute of Technology, Harbin, China

5. Software Huawei Central Software Institute, Beijing, China

Abstract

Large Language Models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, it is inevitable that the requirements written by users might be ambiguous or insufficient. Current LLMs will directly generate programs according to those unclear requirements, regardless of interactive clarification, which will likely deviate from the original user intents. To bridge that gap, we introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. Specifically, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we invite ten participants to use ClarifyGPT for code generation on two benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to conduct large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The results demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines. In particular, ClarifyGPT improves the average performance of GPT-4 and ChatGPT across five benchmarks from 62.43% to 69.60% and from 54.32% to 62.37%, respectively. A human evaluation also confirms the effectiveness of ClarifyGPT in detecting ambiguous requirements and generating high-quality clarifying questions. We believe that ClarifyGPT can effectively facilitate the practical application of LLMs in real-world development environments.

Funder

Youth Innovation Promotion Association Chinese Academy of Sciences, Basic Research Program of ISCAS

Major Program of ISCAS

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3660810

Reference51 articles.

1. 2023. CoderEval. https://github.com/CoderEval/CoderEval

2. 2023. Website. https://github.com/ClarifyGPT/ClarifyGPT

3. Asking Clarifying Questions in Open-Domain Information-Seeking Conversations

4. Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732