Unleashing the power of pseudo-code for binary code similarity analysis

Author:

Zhang Weiwei,Xu Zhengzi,Xiao Yang,Xue YinxingORCID

Abstract

AbstractCode similarity analysis has become more popular due to its significant applicantions, including vulnerability detection, malware detection, and patch analysis. Since the source code of the software is difficult to obtain under most circumstances, binary-level code similarity analysis (BCSA) has been paid much attention to. In recent years, many BCSA studies incorporating AI techniques focus on deriving semantic information from binary functions with code representations such as assembly code, intermediate representations, and control flow graphs to measure the similarity. However, due to the impacts of different compilers, architectures, and obfuscations, binaries compiled from the same source code may vary considerably, which becomes the major obstacle for these works to obtain robust features. In this paper, we propose a solution, named UPPC (Unleashing the Power of Pseudo-code), which leverages the pseudo-code of binary function as input, to address the binary code similarity analysis challenge, since pseudo-code has higher abstraction and is platform-independent compared to binary instructions. UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function. We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures (X86, ARM), different optimization options (O0-O3), different compilers (GCC, Clang), and four obfuscation strategies. The experimental results show that the accuracy of UPPC in function search is 33.2% higher than that of existing methods.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Networks and Communications,Information Systems,Software

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Continuous Authentication in a UAVs Swarm;2023 15th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT);2023-10-30

2. Techniques for Accelerating Algebraic Operations in Agent-based Information Security Systems;2023 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF);2023-05-29

3. SimCoDe-NET: Similarity Detection in Binary Code using Deep Learning Network;International Journal of Electrical and Electronics Research;2023-03-20

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3