Affiliation:
1. School of Software Engineering Huazhong University of Science and Technology Wuhan 430074, Hubei Province, P. R. China
Abstract
Measuring program similarity plays an important role in solving many problems in software engineering. However, because programs are instruction sequences with complex structures and semantic functions and furthermore, programs may be obfuscated deliberately through semantics-preserving transformations, measuring program similarity is a difficult task that has not been adequately addressed. In this paper, we propose a new approach to measuring Java program similarity. The approach first measures the low-level similarity between basic blocks according to the bytecode instruction sequences and the structural property of the basic blocks. Then, an error-tolerant graph matching algorithm that can combat structure transformations is used to match the Control Flow Graphs (CFG) based on the basic block similarity. The high-level similarity between Java programs is subsequently calculated on the matched pairs of the independent paths extracted from the optimal CFG matching. The proposed CFG-Match approach is compared with a string-based approach, a tree-based approach and a graph-based approach. Experimental results show that the CFG-Match approach is more accurate and robust against semantics-preserving transformations. The CFG-Match approach is used to detect Java program plagiarism. Experiments on the collection of benchmark program pairs collected from the students’ submission of project assignments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献