Reducing the Impact of Time Evolution on Source Code Authorship Attribution via Domain Adaptation

Author:

Li Zhen1ORCID,Zhao Shasha1ORCID,Chen Chen2ORCID,Chen Qian3ORCID

Affiliation:

1. Hebei University, Baoding, China

2. University of Central Florida, Orlando, United States

3. The University of Texas at San Antonio, San Antonio, United States

Abstract

Source code authorship attribution is an important problem in practical applications such as plagiarism detection, software forensics, and copyright disputes. Recent studies show that existing methods for source code authorship attribution can be significantly affected by time evolution, leading to a decrease in attribution accuracy year by year. To alleviate the problem of Deep Learning (DL)-based source code authorship attribution degrading in accuracy due to time evolution, we propose a new framework called Time D omain A daptation (TimeDA) by adding new feature extractors to the original DL-based code attribution framework that enhances the learning ability of the original model on source domain features without requiring new or more source data. Moreover, we employ a centroid-based pseudo-labeling strategy using neighborhood clustering entropy for adaptive learning to improve the robustness of DL-based code authorship attribution. Experimental results show that TimeDA can significantly enhance the robustness of DL-based source code authorship attribution to time evolution, with an average improvement of 8.7% on the Java dataset and 5.2% on the C++ dataset. In addition, our TimeDA benefits from employing the centroid-based pseudo-labeling strategy, which significantly reduced the model training time by 87.3% compared to traditional unsupervised domain adaptive methods.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3