Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network-Reference-Cited by-同舟云学术

Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network

Published:2023-10-16 Issue:OOPSLA2 Volume:7 Page:486-515
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Wang Shangwen¹^ORCID,Lin Bo¹^ORCID,Sun Zhensu²^ORCID,Wen Ming³^ORCID,Liu Yepang⁴^ORCID,Lei Yan⁵^ORCID,Mao Xiaoguang¹^ORCID

Affiliation:

1. National University of Defense Technology, Changsha, China

2. Singapore Management University, Singapore, Singapore

3. Huazhong University of Science and Technology, Wuhan, China

4. Southern University of Science and Technology, Shenzhen, China

5. Chongqing University, Chongqing, China

Abstract

Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research. Two types of approaches have been proposed in the literature to achieve this: code generation, which involves generating a new code snippet, and code search, which involves reusing existing code. However, despite existing efforts, the effectiveness of the state-of-the-art techniques remains limited. To seek for further advancement, our insight is that code generation and code search can help overcome the limitation of each other: the code generator can benefit from feedback on the quality of its generated code, which can be provided by the code searcher, while the code searcher can benefit from the additional training data augmented by the code generator to better understand code semantics. Drawing on this insight, we propose a novel approach that combines code generation and code search techniques using a generative adversarial network (GAN), enabling mutual improvement through the adversarial training. Specifically, we treat code generation and code search as the generator and discriminator in the GAN framework, respectively, and incorporate several customized designs for our tasks. We evaluate our approach in eight different settings, and consistently observe significant performance improvements for both code generation and code search. For instance, when using NatGen, a state-of-the-art code generator, as the generator and GraphCodeBERT, a state-of-the-art code searcher, as the discriminator, we achieve a 32% increase in CodeBLEU score for code generation, and a 12% increase in mean reciprocal rank for code search on a large-scale Python dataset, compared to their original performances.

Funder

National Key R&D Program of China

National Natural Science Foundation of China

Young Elite Scientists Sponsorship Program by CAST

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3622815

Reference89 articles.

1. A Survey of Machine Learning for Big Code and Naturalness

2. ACTGAN: Automatic Configuration Tuning for Software Systems with Generative Adversarial Networks

3. Grounded Copilot: How Programmers Interact with Code-Generating Models

4. Samy Bengio , Oriol Vinyals , Navdeep Jaitly , and Noam Shazeer . 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28 ( 2015 ). Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28 (2015).

5. Andrew Brock , Jeff Donahue , and Karen Simonyan . 2019 . Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation Approach;Proceedings of the 15th Asia-Pacific Symposium on Internetware;2024-07-24

2. Fusing Code Searchers;IEEE Transactions on Software Engineering;2024-07

3. Natural Language to Code: How Far Are We?;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30