CodeGen-Search: A Code Generation Model Incorporating Similar Sample Information-Reference-Cited by-同舟云学术

CodeGen-Search: A Code Generation Model Incorporating Similar Sample Information

Published:2023-10-30 Issue:11n12 Volume:33 Page:1899-1921
ISSN:0218-1940
Container-title:International Journal of Software Engineering and Knowledge Engineering
language:en
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.

Author:

Li HongWei¹^ORCID,Kuang JiangLing¹^ORCID,Zhong MaoSheng¹^ORCID,Wang ZhiXiang¹^ORCID,Liu Gen¹^ORCID,Liu GanLin¹^ORCID,Xiao YingJian¹^ORCID

Affiliation:

1. School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi, P. R. China

Abstract

Code generation has a positive significance in supporting software development, reducing labor intensity, and improving development efficiency. Some scholars use similar code information to enhance the quality of code generation. However, to improve the efficiency and accuracy of programming in daily development tasks, developers often search for similar samples as references. They get the code’s syntactic structure and semantic information from similar samples to assist in programming development. Inspired by this, we argue that similar samples are helpful for code generation. This paper proposes a CodeGen-Search model to improve code generation quality by incorporating similar samples. To fully utilize the information of similar samples, the model adopts the “pre-training [Formula: see text] fine-tuning” pattern. The model uses a minimum edit distance algorithm to find some similar samples with natural language (NL), and uses different encoders to extract the features of the NL and the code in similar samples. Experimental results show that our model efficiently improves the quality of the generated code. Compared to the state-of-the-art model, the CodeGen-Search model improves the BLEU by 1.5%, the Rough by 0.8% on the HS dataset, and the StrAcc by 0.5% on the ATIS dataset.

Funder

National Natural Science Foundation of China

Jiangxi Normal University

Publisher

World Scientific Pub Co Pte Ltd

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218194023500584

Reference31 articles.

1. Latent Predictor Networks for Code Generation