Modular Tree Network for Source Code Representation Learning-Reference-Cited by-同舟云学术

Modular Tree Network for Source Code Representation Learning

Published:2020-10-31 Issue:4 Volume:29 Page:1-23
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Wang Wenhan¹,Li Ge¹,Shen Sijie¹,Xia Xin²^ORCID,Jin Zhi¹

Affiliation:

1. Peking University, Beijing, P.R. China

2. Monash University, Melbourne, Victoria, Australia

Abstract

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree (AST)-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this article, we propose a modular tree network that dynamically composes different neural network units into tree structures based on the input AST. Different from previous tree-structural neural network models, a modular tree network can capture the semantic differences between types of AST substructures. We evaluate our model on two tasks: program classification and code clone detection. Our model achieves the best performance compared with state-of-the-art approaches in both tasks, showing the advantage of leveraging more elaborate structure information of the source code.

Funder

National Natural Science Foundation of China

Australian Research Council’s Discovery

National Key R8D Program

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Link

https://dl.acm.org/doi/pdf/10.1145/3409331

Reference38 articles.

1. A Survey of Machine Learning for Big Code and Naturalness

2. Uri Alon Omer Levy and Eran Yahav. 2018. Code2seq: Generating sequences from structured representations of code. arXiv:1808.01400 Uri Alon Omer Levy and Eran Yahav. 2018. Code2seq: Generating sequences from structured representations of code. arXiv:1808.01400

3. code2vec: learning distributed representations of code

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Query-oriented two-stage attention-based model for code search;Journal of Systems and Software;2024-04

2. On the impact of multiple source code representations on software engineering tasks — An empirical study;Journal of Systems and Software;2024-04

3. A survey on machine learning techniques applied to source code;Journal of Systems and Software;2024-03

4. A systematic literature review on the applications of recurrent neural networks in code clone research;PLOS ONE;2024-02-02

5. Deep learning with class-level abstract syntax tree and code histories for detecting code modification requirements;Journal of Systems and Software;2023-12