Affiliation:
1. College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
2. Software Security Company, Chengdu 610041, China
Abstract
Code classification and code clone detection are crucial for understanding and maintaining large software systems. Although deep learning surpasses traditional techniques in capturing the features of source code, existing models suffer from low processing power and high complexity. We propose a novel source code representation method based on the multi-head attention mechanism (SCRMHA). SCRMHA captures the vector representation of entire code segments, enabling it to focus on different positions of the input sequence, capture richer semantic information, and simultaneously process different aspects and relationships of the sequence. Moreover, it can calculate multiple attention heads in parallel, speeding up the computational process. We evaluate SCRMHA on both the standard dataset and an actual industrial dataset, and analyze the differences between these two datasets. Experiment results in code classification and clone detection tasks show that SCRMHA consumes less time and reduces complexity by about one-third compared with traditional source code feature representation methods. The results demonstrate that SCRMHA reduces the computational complexity and time consumption of the model while maintaining accuracy.
Funder
Xiamen City Science and Technology Development Project
Xiamen City Natural Science Foundation
Xiamen Institute of Technology High level Talent Program
Fujian Provincial Department of Education Young and Middle aged Teacher Education Research Project
Reference25 articles.
1. Code Clone Detection Based on Token Semantics;Wang;Comput. Syst. Appl.,2022
2. Wei, H.H., and Li, M. (2017, January 19–25). Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17), Melbourne, Australia.
3. CCStokener: Fast yet accurate code clone detection with semantic token;Wang;J. Syst. Softw.,2023
4. Ranking code clones to support maintenance activities;Ehsan;Empirical Softw. Eng.,2023
5. Wang, L., Segal, M., Chen, J., and Qiu, T. (2022, January 24–26). Precise Code Clone Detection with Architecture of Abstract Syntax Trees. Proceedings of the Wireless Algorithms, Systems, and Applications, Dalian, China.