Affiliation:
1. National University of Defense Technology, Changsha, China
2. Singapore Management University, Singapore, Singapore
3. Huazhong University of Science and Technology, Wuhan, China
4. Southern University of Science and Technology, Shenzhen, China
5. Chongqing University, Chongqing, China
Abstract
Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research.
Two types of approaches have been proposed in the literature to achieve this: code generation, which involves generating a new code snippet, and code search, which involves reusing existing code.
However, despite existing efforts, the effectiveness of the state-of-the-art techniques remains limited.
To seek for further advancement, our insight is that code generation and code search can help overcome the limitation of each other:
the code generator can benefit from feedback on the quality of its generated code, which can be provided by the code searcher, while the code searcher can benefit from the additional training data augmented by the code generator to better understand code semantics.
Drawing on this insight, we propose a novel approach that combines code generation and code search techniques using a generative adversarial network (GAN), enabling mutual improvement through the adversarial training.
Specifically, we treat code generation and code search as the generator and discriminator in the GAN framework, respectively, and incorporate several customized designs for our tasks.
We evaluate our approach in eight different settings, and consistently observe significant performance improvements for both code generation and code search.
For instance, when using NatGen, a state-of-the-art code generator, as the generator and GraphCodeBERT, a state-of-the-art code searcher, as the discriminator, we achieve a 32% increase in CodeBLEU score for code generation, and a 12% increase in mean reciprocal rank for code search on a large-scale Python dataset, compared to their original performances.
Funder
National Key R&D Program of China
National Natural Science Foundation of China
Young Elite Scientists Sponsorship Program by CAST
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Reference89 articles.
1. A Survey of Machine Learning for Big Code and Naturalness
2. ACTGAN: Automatic Configuration Tuning for Software Systems with Generative Adversarial Networks
3. Grounded Copilot: How Programmers Interact with Code-Generating Models
4. Samy Bengio , Oriol Vinyals , Navdeep Jaitly , and Noam Shazeer . 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28 ( 2015 ). Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28 (2015).
5. Andrew Brock , Jeff Donahue , and Karen Simonyan . 2019 . Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation Approach;Proceedings of the 15th Asia-Pacific Symposium on Internetware;2024-07-24
2. Fusing Code Searchers;IEEE Transactions on Software Engineering;2024-07
3. Natural Language to Code: How Far Are We?;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30