Affiliation:
1. Peking University, Beijing, P.R. China
2. Monash University, Melbourne, Victoria, Australia
Abstract
Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree (AST)-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this article, we propose a modular tree network that dynamically composes different neural network units into tree structures based on the input AST. Different from previous tree-structural neural network models, a modular tree network can capture the semantic differences between types of AST substructures. We evaluate our model on two tasks: program classification and code clone detection. Our model achieves the best performance compared with state-of-the-art approaches in both tasks, showing the advantage of leveraging more elaborate structure information of the source code.
Funder
National Natural Science Foundation of China
Australian Research Council’s Discovery
National Key R8D Program
Publisher
Association for Computing Machinery (ACM)
Reference38 articles.
1. A Survey of Machine Learning for Big Code and Naturalness
2. Uri Alon Omer Levy and Eran Yahav. 2018. Code2seq: Generating sequences from structured representations of code. arXiv:1808.01400 Uri Alon Omer Levy and Eran Yahav. 2018. Code2seq: Generating sequences from structured representations of code. arXiv:1808.01400
3. code2vec: learning distributed representations of code
Cited by
29 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献