CYCLE: Learning to Self-Refine the Code Generation-Reference-Cited by-同舟云学术

CYCLE: Learning to Self-Refine the Code Generation

Published:2024-04-29 Issue:OOPSLA1 Volume:8 Page:392-418
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Ding Yangruibo¹^ORCID,Min Marcus J.¹^ORCID,Kaiser Gail¹^ORCID,Ray Baishakhi¹^ORCID

Affiliation:

1. Columbia University, New York, USA

Abstract

Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well. In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5

Funder

NSF

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3649825

Reference56 articles.

1. Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, and Leandro von Werra. 2023. SantaCoder: don’t reach for the stars!. arxiv:2301.03988.

2. Amazon. 2023. Amazon CodeWhisperer: Build applications faster and more securely with your AI coding companion. https://aws.amazon.com/codewhisperer/

3. Anthropic. 2023. Introducing Claude. https://www.anthropic.com/index/introducing-claude

4. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732

5. Grounded Copilot: How Programmers Interact with Code-Generating Models