A Novel Source Code Representation Approach Based on Multi-Head Attention

Author:

Xiao Lei1ORCID,Zhong Hao1,Liu Jianjian1,Zhang Kaiyu1,Xu Qizhen1ORCID,Chang Le2

Affiliation:

1. College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China

2. Software Security Company, Chengdu 610041, China

Abstract

Code classification and code clone detection are crucial for understanding and maintaining large software systems. Although deep learning surpasses traditional techniques in capturing the features of source code, existing models suffer from low processing power and high complexity. We propose a novel source code representation method based on the multi-head attention mechanism (SCRMHA). SCRMHA captures the vector representation of entire code segments, enabling it to focus on different positions of the input sequence, capture richer semantic information, and simultaneously process different aspects and relationships of the sequence. Moreover, it can calculate multiple attention heads in parallel, speeding up the computational process. We evaluate SCRMHA on both the standard dataset and an actual industrial dataset, and analyze the differences between these two datasets. Experiment results in code classification and clone detection tasks show that SCRMHA consumes less time and reduces complexity by about one-third compared with traditional source code feature representation methods. The results demonstrate that SCRMHA reduces the computational complexity and time consumption of the model while maintaining accuracy.

Funder

Xiamen City Science and Technology Development Project

Xiamen City Natural Science Foundation

Xiamen Institute of Technology High level Talent Program

Fujian Provincial Department of Education Young and Middle aged Teacher Education Research Project

Publisher

MDPI AG

Reference25 articles.

1. Code Clone Detection Based on Token Semantics;Wang;Comput. Syst. Appl.,2022

2. Wei, H.H., and Li, M. (2017, January 19–25). Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17), Melbourne, Australia.

3. CCStokener: Fast yet accurate code clone detection with semantic token;Wang;J. Syst. Softw.,2023

4. Ranking code clones to support maintenance activities;Ehsan;Empirical Softw. Eng.,2023

5. Wang, L., Segal, M., Chen, J., and Qiu, T. (2022, January 24–26). Precise Code Clone Detection with Architecture of Abstract Syntax Trees. Proceedings of the Wireless Algorithms, Systems, and Applications, Dalian, China.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3