Author:
Bui Nghi D. Q.,Yu Yijun,Jiang Lingxiao
Abstract
Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., abstract syntax trees) and/or semantic information (e.g., dependency graphs). While graphs may be better than trees at capturing code semantics, constructing the graphs from code inputs through the semantic analysis of multiple viewpoints can lead to inaccurate noises for a specific software engineering task. Compared to graphs, syntax trees are more precisely defined on the grammar and easier to parse; unfortunately, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques. We have proposed a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks to achieve a learning accuracy higher than some existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variable-to-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. Our implementation is publicly available at: https://github.com/bdqnghi/treecaps.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking;Proceedings of the ACM on Software Engineering;2024-07-12
2. TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation;IEEE Transactions on Software Engineering;2024-06
3. Learning to Detect Memory-related Vulnerabilities;ACM Transactions on Software Engineering and Methodology;2023-12-23
4. On-the-fly Improving Performance of Deep Code Models via Input Denoising;2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE);2023-09-11
5. Test Case Recommendations with Distributed Representation of Code Syntactic Features;2023 38th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW);2023-09-11