Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network-Reference-Cited by-同舟云学术

Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network

Published:2022-09-30 Issue:10 Volume:14 Page:287
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Kurtukova Anna,Romanov Aleksandr^ORCID,Shelupanov Alexander,Fedotova Anastasia^ORCID

Abstract

This paper is a continuation of our previous work on solving source code authorship identification problems. The analysis of heterogeneous source code is a relevant issue for copyright protection in commercial software development. This is related to the specificity of development processes and the usage of collaborative development tools (version control systems). As a result, there are source codes written according to different programming standards by a team of programmers with different skill levels. Another application field is information security—in particular, identifying the author of computer viruses. We apply our technique based on a hybrid of Inception-v1 and Bidirectional Gated Recurrent Units architectures on heterogeneous source codes and consider the most common commercial development complex cases that negatively affect the authorship identification process. The paper is devoted to the possibilities and limitations of the author’s technique in various complex cases. For situations where a programmer was proficient in two programming languages, the average accuracy was 87%; for proficiency in three or more—76%. For the artificially generated source code case, the average accuracy was 81.5%. Finally, the average accuracy for source codes generated from commits was 84%. The comparison with state-of-the-art approaches showed that the proposed method has no full-functionality analogs covering actual practical cases.

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/14/10/287/pdf

Reference38 articles.

1. Identification author of source code by machine learning methods;Tr. SPIIRAN,2019

2. Kurtukova, A., Romanov, A., and Shelupanov, A. (2020). Source Code Authorship Identification Using Deep Neural Networks. Symmetry, 12.

3. Abuhamad, M., AbuHmed, T., Mohaisen, A., and Nyang, D. (2018, January 15–19). Large-Scale and Language-Oblivious Code Authorship Identification. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada.

4. Zhen, L., Chen, G., Chen, C., Zou, Y., and Xu, S. (2022, January 25–27). RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation. Proceedings of the 2022 IEEE 44th International Conference on Software Engineering (ICSE), Pittsburgh, PA, USA.

5. Holland, C., Khoshavi, N., and Jaimes, L.G. (2022, January 18–20). Code authorship identification via deep graph CNNs. Proceedings of the 2022 ACM Southeast Conference (ACM SE ‘22), Virtual.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Authorship Identification of Binary and Disassembled Codes Using NLP Methods;Information;2023-06-25