Semantic Code Clone Detection Based on Community Detection-Reference-Cited by-同舟云学术

Semantic Code Clone Detection Based on Community Detection

Published:2024-07-26 Issue: Volume: Page:1-32
ISSN:0218-1940
Container-title:International Journal of Software Engineering and Knowledge Engineering
language:en
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.

Author:

Wan Zexuan¹^ORCID,Xie Chunli¹^ORCID,Lv Quanrun¹^ORCID,Fan Yasheng¹^ORCID

Affiliation:

1. School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, P. R. China

Abstract

Semantic code clone detection is to find code snippets that are structurally or syntactically different, but semantically identical. It plays an important role in software reuse, code compression. Many existing studies have achieved good performance in non-semantic clone, but semantic clone is still a challenging task. Recently, several works have used tree or graph, such as Abstract Syntax Tree (AST), Control Flow Graph (CFG) or Program Dependency Graph (PDG) to extract semantic information from source codes. In order to reduce the complexity of tree and graph, some studies transform them into node sequences. However, this transformation will lose some semantic information. To address this issue, we propose a novel high-performance method that utilizes community detection to extract features of AST while preserving its semantic information. First, based on the AST of source code, we exploit community detection to split AST into different subtrees to extract the underlying semantics information of different code blocks, and use centrality analysis to quantify the semantic information as the weight of AST nodes. Then, the AST is converted into a sequence of tokens with weights, and a Siamese neural network model is used to detect the similarity of token sequences for semantic code clone detection. Finally, to evaluate our approach, we conduct experiments on two standard benchmark datasets, Google Code Jam (GCJ) and BigCloneBench (BCB). Experimental results show that our model outperforms the eight publicly available state-of-the-art methods in detecting code clones. It is five times faster than the tree-based method (ASTNN) in terms of time complexity.

Funder

National Natural Science Foundation of China General Fund

Jiangsu Normal University Graduate Research and Practice Innovation Project

Publisher

World Scientific Pub Co Pte Ltd

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218194024500323

Reference36 articles.

1. CCGraph

2. DeepSim: deep learning code functional similarity

3. Scalable detection of semantic clones

4. Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

5. AST-path Based Compare-Aggregate Network for Code Clone Detection