Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection
-
Published:2023-04-04
Issue:7
Volume:12
Page:1722
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Liu Guangming12, Zhou Xin2ORCID, Pang Jianmin2, Yue Feng2, Liu Wenfu23ORCID, Wang Junchao2
Affiliation:
1. School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China 2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China 3. State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang 471000, China
Abstract
Binary code similarity detection is used to calculate the code similarity of a pair of binary functions or files, through a certain calculation method and judgment method. It is a fundamental task in the field of computer binary security. Traditional methods of similarity detection usually use graph matching algorithms, but these methods have poor performance and unsatisfactory effects. Recently, graph neural networks have become an effective method for analyzing graph embeddings in natural language processing. Although these methods are effective, the existing methods still do not sufficiently learn the information of the binary code. To solve this problem, we propose Codeformer, an iterative model of a graph neural network (GNN)-nested Transformer. The model uses a Transformer to obtain an embedding vector of the basic block and uses the GNN to update the embedding vector of each basic block of the control flow graph (CFG). Codeformer iteratively executes basic block embedding to learn abundant global information and finally uses the GNN to aggregate all the basic blocks of a function. We conducted experiments on the OpenSSL, Clamav and Curl datasets. The evaluation results show that our method outperforms the state-of-the-art models.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference41 articles.
1. Feng, Q., Wang, M., Zhang, M., Zhou, R., Henderson, A., and Yin, H. (2017, January 2–6). Extracting conditional formulas for cross-platform bug search. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates. 2. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., and Yin, H. (2016, January 24–28). Scalable graph-based bug search for firmware images. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria. 3. Ding, S.H., Fung, B.C., and Charland, P. (2019, January 19–23). Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA. 4. Bowman, B., and Huang, H.H. (2020, January 7–11). VGRAPH: A robust vulnerable code clone detection system using code property triplets. Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy. 5. Golubev, Y., Poletansky, V., Povarov, N., and Bryksin, T. (2021, January 9–12). Multi-threshold token-based code clone detection. Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|