Survey of Code Search Based on Deep Learning-Reference-Cited by-同舟云学术

Survey of Code Search Based on Deep Learning

Published:2023-12-23 Issue:2 Volume:33 Page:1-42
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Xie Yutao¹^ORCID,Lin Jiayi²^ORCID,Dong Hande²^ORCID,Zhang Lei²^ORCID,Wu Zhonghai³^ORCID

Affiliation:

1. Peking University & International Digital Economy Academy, China

2. International Digital Economy Academy, China

3. Key Lab of High Confidence Software Technologies (MOE), Peking University, China

Abstract

Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given natural language query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have been applied to code search with significant progress. Deep learning is now the leading paradigm for code search. In this survey, we provide a comprehensive overview of deep learning-based code search. We review the existing deep learning-based code search framework that maps query/code to vectors and measures their similarity. Furthermore, we propose a new taxonomy to illustrate the state-of-the-art deep learning-based code search in a three-step process: query semantics modeling, code semantics modeling, and matching modeling, which involves the deep learning model training. Finally, we suggest potential avenues for future research in this promising field.

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/3628161

Reference103 articles.

1. code2vec: learning distributed representations of code

2. Shushan Arakelyan, Anna Hakhverdyan, Miltiadis Allamanis, Luis Garcia, Christophe Hauser, and Xiang Ren. 2022. NS3: Neuro-symbolic semantic code search. In Proceedings of the Conference on Neural Information Processing Systems. Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/43f5f6c5cb333115914c8448b8506411-Abstract-Conference.html

3. Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing, Greg Kondrak and Taro Watanabe (Eds.). Asian Federation of Natural Language Processing, 314–319. Retrieved from https://aclanthology.org/I17-2053/

4. Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations

5. CSSAM: Code Search via Attention Matching of Code Semantics and Structures

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Specialized model initialization and architecture optimization for few-shot code search;Information and Software Technology;2025-01

2. Mapping Source Code to Software Architecture by Leveraging Large Language Models;Lecture Notes in Computer Science;2024