code2vec: learning distributed representations of code-Reference-Cited by-同舟云学术

code2vec: learning distributed representations of code

Published:2019-01-02 Issue:POPL Volume:3 Page:1-29
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Alon Uri¹,Zilberstein Meital¹,Levy Omer²,Yahav Eran¹

Affiliation:

1. Technion, Israel

2. Facebook AI Research, USA

Abstract

We present a neural model for representing snippets of code as continuous distributed vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed-length code vector, which can be used to predict semantic properties of the snippet. To this end, code is first decomposed to a collection of paths in its abstract syntax tree. Then, the network learns the atomic representation of each path while simultaneously learning how to aggregate a set of them. We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 12M methods. We show that code vectors trained on this dataset can predict method names from files that were unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies. A comparison of our approach to previous techniques over the same dataset shows an improvement of more than 75%, making it the first to successfully predict method names based on a large, cross-project corpus. Our trained model, visualizations and vector similarities are available as an interactive online demo at http://code2vec.org. The code, data and trained models are available at https://github.com/tech-srl/code2vec.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3290353

Reference73 articles.

1. Learning natural coding conventions

2. Suggesting accurate method and class names

3. Miltiadis Allamanis Marc Brockschmidt and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR . Miltiadis Allamanis Marc Brockschmidt and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR .

Cited by 727 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Graph-based explainable vulnerability prediction;Information and Software Technology;2025-01

3. EvaluateXAI: A framework to evaluate the reliability and consistency of rule-based XAI techniques for software analytics tasks;Journal of Systems and Software;2024-11

4. SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT;Computers & Security;2024-10

5. On Representation Learning-based Methods for Effective, Efficient, and Scalable Code Retrieval;Neurocomputing;2024-10