Affiliation:
1. College of Cybersecurity, Sichuan University, Chengdu 610065, China
Abstract
Code cloning is a common practice in software development, where developers reuse existing code to accelerate programming speed and enhance work efficiency. Existing clone-detection methods mainly focus on code clones within a single programming language. To address the challenge of code clone instances in cross-platform development, we propose a novel method called TCCCD, which stands for Triplet-Based Cross-Language Code Clone Detection. Our approach is based on machine learning and can accurately detect code clone instances between different programming languages. We used the pre-trained model UniXcoder to map programs written in different languages into the same vector space and learn their code representations. Then, we fine-tuned TCCCD using triplet learning to improve its effectiveness in cross-language clone detection. To assess the effectiveness of our proposed approach, we conducted thorough comparative experiments using the dataset provided by the paper titled CLCDSA (Cross Language Code Clone Detection using Syntactical Features and API Documentation). The experimental results demonstrated a significant improvement of our approach over the state-of-the-art baselines, with precision, recall, and F1-measure scores of 0.96, 0.91, and 0.93, respectively. In summary, we propose a novel cross-language code-clone-detection method called TCCCD. TCCCD leverages the pre-trained model UniXcode for source code representation and fine-tunes the model using triplet learning. In the experimental results, TCCCD outperformed the state-of-the-art baselines in terms of the precision, recall, and F1-measure.
Funder
National Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference53 articles.
1. A survey on software clone detection research;Roy;Queen’s Sch. Comput. TR,2007
2. Analyzing cloning evolution in the linux kernel;Antoniol;Inf. Softw. Technol.,2002
3. Dang, Y., Ge, S., Huang, R., and Zhang, D. (2011, January 23). Code clone detection experience at Microsoft. Proceedings of the 5th International Workshop on Software Clones, Waikiki, HI, USA.
4. Comparison and evaluation of clone detection tools;Bellon;IEEE Trans. Softw. Eng.,2007
5. Juergens, E., Deissenboeck, F., Hummel, B., and Wagner, S. (2009). Proceedings of the IEEE 31st International Conference on Software Engineering, IEEE.