Semantic Clone Detection via Probabilistic Software Modeling-Reference-Cited by-同舟云学术

Semantic Clone Detection via Probabilistic Software Modeling

Published:2022 Issue: Volume: Page:288-309
ISSN:0302-9743
Container-title:Fundamental Approaches to Software Engineering
language:
Short-container-title:

Author:

Thaller Hannes^ORCID,Linsbauer Lukas,Egyed Alexander

Abstract

AbstractSemantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. In recent years, this boundary has been tested using interesting new approaches. This article contributes a semantic clone detection approach that detects clones which have 0 % syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements for finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision on whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater than 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.

Publisher

Springer International Publishing

Link

https://link.springer.com/content/pdf/10.1007/978-3-030-99429-7_16

Reference51 articles.

1. Arnold, K., Gosling, J., Holmes, D.: The Java Programming Language. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 3rd edn. (2000)

2. Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and Evaluation of Clone Detection Tools. IEEE Transactions on Software Engineering 33(9), 577–591 (2007). https://doi.org/10.1109/TSE.2007.70725

3. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS one 12(6), e0177678 (2017)

4. Chou, A., Yang, J., Chelf, B., Hallem, S., Engler, D.: An empirical study of operating systems errors. ACM SIGOPS Operating Systems Review 35(5), 73 (Dec 2001). https://doi.org/10.1145/502059.502042

5. Cordy, J.R., Roy, C.K.: The NiCad Clone Detector. In: 2011 IEEE 19th International Conference on Program Comprehension. p. 219–220 (Jun 2011). https://doi.org/10.1109/ICPC.2011.26

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Recovering Semantics from Control Code for Improving Reusability in Industrial Edge Computing;2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE);2024-06-18

2. A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges;Journal of Systems and Software;2023-10