Author:
Zhang Xu,Xiang Yanzheng,Liu Zejie,Hu Xiaoyu,Zhou Deyu
Abstract
Code search, which locates code snippets in large code repositories based on natural language queries entered by developers, has become increasingly popular in the software development process. It has the potential to improve the efficiency of software developers. Recent studies have demonstrated the effectiveness of using deep learning techniques to represent queries and codes accurately for code search. In specific, pre-trained models of programming languages have recently achieved significant progress in code searching. However, we argue that aligning programming and natural languages are crucial as there are two different modalities. Existing pre-train models based approaches for code search do not effectively consider implicit alignments of representations across modalities (inter-modal representation). Moreover, the existing methods do not take into account the consistency constraint of intra-modal representations, making the model ineffective. As a result, we propose a novel code search method that optimizes both intra-modal and inter-modal representation learning. The alignment of the representation between the two modalities is achieved by introducing contrastive learning. Furthermore, the consistency of intra-modal feature representation is constrained by KL-divergence. Our experimental results confirm the model’s effectiveness on seven different test datasets. This paper proposes a code search method that significantly improves existing methods. Our source code is publicly available on GitHub.1
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science
Reference31 articles.
1. C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie and C. Fu, Portfolio: finding relevant functions and their usage, in: Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 111–120.
2. F. Lv, H. Zhang, J.-g. Lou, S. Wang, D. Zhang and J. Zhao, Codehow: Effective code search based on api understanding and extended boolean model (e), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2015, pp. 260–270.
3. M. Lu, X. Sun, S. Wang, D. Lo and Y. Duan, Query expansion via wordnet for effective code search, in: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 545–549.
4. S. Yan, H. Yu, Y. Chen, B. Shen and L. Jiang, Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries, in: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2020, pp. 344–354.
5. J. Shuai, L. Xu, C. Liu, M. Yan, X. Xia and Y. Lei, Improving code search with co-attentive representation learning, in: Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 196–207.