In-IDE Code Generation from Natural Language: Promise and Challenges-Reference-Cited by-同舟云学术

In-IDE Code Generation from Natural Language: Promise and Challenges

Published:2022-03-04 Issue:2 Volume:31 Page:1-47
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Xu Frank F.¹,Vasilescu Bogdan¹,Neubig Graham¹

Affiliation:

1. Carnegie Mellon University, Pittsburgh, PA

Abstract

A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code , especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries , but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/3487569

Reference130 articles.

1. JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

2. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In International Symposium on Foundations of Software Engineering (ESEC/FSE). 281–293.

3. A Survey of Machine Learning for Big Code and Naturalness

4. Miltiadis Allamanis, Daniel Tarlow, A. Gordon, and Y. Wei. 2015. Bimodal modelling of source code and natural language. In 32nd International Conference on Machine Learning (ICML).

5. FeedBaG: An interaction tracker for Visual Studio;Amann S.;I,2016

Cited by 52 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transformers in source code generation: A comprehensive survey;Journal of Systems Architecture;2024-08

2. Rethinking AI code generation: a one-shot correction approach based on user feedback;Automated Software Engineering;2024-07-12

3. Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks;Proceedings of the ACM on Software Engineering;2024-07-12

4. Performance, Workload, Emotion, and Self-Efficacy of Novice Programmers Using AI Code Generation;Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1;2024-07-03

5. BatFix: Repairing language model-based transpilation;ACM Transactions on Software Engineering and Methodology;2024-06-27