Deep Graph Matching and Searching for Semantic Code Retrieval

Author:

Ling Xiang1ORCID,Wu Lingfei2ORCID,Wang Saizhuo1,Pan Gaoning1,Ma Tengfei2,Xu Fangli3,Liu Alex X.4,Wu Chunming5,Ji Shouling1

Affiliation:

1. Zhejiang University, Hangzhou, Zhejiang, China

2. IBM T. J. Watson Research Center, Yorktown Heights, NY

3. Squirrel AI Learning, Highland Park, NJ

4. Ant Group, Zhejiang, China

5. Zhejiang University and Zhejiang Lab, Hangzhou, Zhejiang, China

Abstract

Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however, neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this article, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve the best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets, but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public code retrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.

Funder

National Key R&D Program of China

Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars

Fundamental Research Funds for the Central Universities

NSFC

Key R&D Program of Zhejiang Province

Major Scientific Project of Zhejiang Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 48 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTM;Applied Sciences;2024-07-02

2. CodeFuse: Multimodal Code Search Model with Fine-Grained Attention Alignment;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02

3. Fusing Code Searchers;IEEE Transactions on Software Engineering;2024-07

4. A Survey of Source Code Search: A 3-Dimensional Perspective;ACM Transactions on Software Engineering and Methodology;2024-06-28

5. CMCS: contrastive-metric learning via vector-level sampling and augmentation for code search;Scientific Reports;2024-06-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3