Non-Autoregressive Line-Level Code Completion-Reference-Cited by-同舟云学术

Non-Autoregressive Line-Level Code Completion

Published:2024-06-03 Issue:5 Volume:33 Page:1-34
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Liu Fang¹^ORCID,Fu Zhiyi²^ORCID,Li Ge²^ORCID,Jin Zhi²^ORCID,Liu Hui³^ORCID,Hao Yiyang⁴^ORCID,Zhang Li¹^ORCID

Affiliation:

1. School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China

2. School of Computer Science, Peking University, Beijing, China

3. Beijing Institute of Technology, Beijing, China

4. Silicon Heart Tech Co., Beijing, China

Abstract

Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to complete code sequences in a left-to-right, token-by-token fashion. To improve the accuracy and efficiency of code completion, we argue that tokens within a code statement have the potential to be predicted concurrently. In this article, we first conduct an empirical study to analyze the dependency among the target tokens in line-level code completion. The results suggest that it is potentially practical to generate all statement tokens in parallel. To this end, we introduce SANAR, a simple and effective syntax-aware non-autoregressive model for line-level code completion. To further improve the quality of the generated code, we propose an adaptive and syntax-aware sampling strategy to boost the model’s performance. The experimental results obtained from two widely used datasets indicate that our model outperforms state-of-the-art code completion approaches of similar model size by a considerable margin, and is faster than these models with up to 9× speed-up. Moreover, the extensive results additionally demonstrate that the enhancements achieved by SANAR become even more pronounced with larger model sizes, highlighting their significance.

Funder

National Natural Science Foundation of China

Self-determined Research Funds of State Key Laboratory of Complex & Critical Software Environment

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3649594

Reference54 articles.

1. aiXcoder. 2018. aiXcoder. https://www.aixcoder.com/

2. Mining source code repositories at massive scale using language modeling

3. Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pondé de Oliveira Pinto Jared Kaplan Harrison Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Joshua Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

4. An empirical study on the usage of transformer models for code completion;Ciniselli Matteo;IEEE Transactions on Software Engineering,2021

5. Source code recommender systems: The practitioners’ perspective;Ciniselli Matteo;arXiv preprint arXiv:2302.04098,2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models for code completion: A systematic literature review;Computer Standards & Interfaces;2025-03