CodePlan: Repository-Level Coding using LLMs and Planning-Reference-Cited by-同舟云学术

CodePlan: Repository-Level Coding using LLMs and Planning

Published:2024-07-12 Issue:FSE Volume:1 Page:675-698
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Bairi Ramakrishna¹^ORCID,Sonwane Atharv¹^ORCID,Kanade Aditya¹^ORCID,C. Vageesh D.¹^ORCID,Iyer Arun¹^ORCID,Parthasarathy Suresh¹^ORCID,Rajamani Sriram¹^ORCID,Ashok B.¹^ORCID,Shet Shashank¹^ORCID

Affiliation:

1. Microsoft Research, Bangalore, India

Abstract

Software engineering activities such as package migration, fixing error reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks. Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic, neuro-symbolic framework called CodePlan to solve it. CodePlan synthesizes a multi-step chain-of-edits (plan), where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. CodePlan is based on a novel combination of an incremental dependency analysis, a change may-impact analysis and an adaptive planning algorithm (symbolic components) with the neural LLMs. We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python). Each task is evaluated on multiple code repositories, each of which requires inter-dependent changes to many files (between 2–97 files). Coding tasks of this level of complexity have not been automated using LLMs before. Our results show that CodePlan has better match with the ground truth compared to baselines. CodePlan is able to get 5/7 repositories to pass the validity checks (i.e., to build without errors and make correct code edits) whereas the baselines (without planning but with the same type of contextual information as CodePlan) cannot get any of the repositories to pass them. We provide our (non-proprietary) data, evaluation scripts and supplementary material at https://github.com/microsoft/codeplan.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3643757

Reference95 articles.

1. 2020. Reactive Streams TCK. https://github.com/reactive-streams/reactive-streams-dotnet/tree/master/src/tck

2. 2022. das-qna-api. https://github.com/SkillsFundingAgency/das-qna-api

3. 2023. Amazon Code Whisperer - AI Code Generator. https://aws.amazon.com/codewhisperer/

4. 2023. audiocraft. https://github.com/facebookresearch/audiocraft

5. 2023. git-diff. https://git-scm.com/docs/git-diff