Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection-Reference-Cited by-同舟云学术

Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection

Published:2023-04-04 Issue:7 Volume:12 Page:1722
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Liu Guangming¹²,Zhou Xin²^ORCID,Pang Jianmin²,Yue Feng²,Liu Wenfu²³^ORCID,Wang Junchao²

Affiliation:

1. School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China

2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China

3. State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang 471000, China

Abstract

Binary code similarity detection is used to calculate the code similarity of a pair of binary functions or files, through a certain calculation method and judgment method. It is a fundamental task in the field of computer binary security. Traditional methods of similarity detection usually use graph matching algorithms, but these methods have poor performance and unsatisfactory effects. Recently, graph neural networks have become an effective method for analyzing graph embeddings in natural language processing. Although these methods are effective, the existing methods still do not sufficiently learn the information of the binary code. To solve this problem, we propose Codeformer, an iterative model of a graph neural network (GNN)-nested Transformer. The model uses a Transformer to obtain an embedding vector of the basic block and uses the GNN to update the embedding vector of each basic block of the control flow graph (CFG). Codeformer iteratively executes basic block embedding to learn abundant global information and finally uses the GNN to aggregate all the basic blocks of a function. We conducted experiments on the OpenSSL, Clamav and Curl datasets. The evaluation results show that our method outperforms the state-of-the-art models.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/7/1722/pdf

Reference41 articles.

1. Feng, Q., Wang, M., Zhang, M., Zhou, R., Henderson, A., and Yin, H. (2017, January 2–6). Extracting conditional formulas for cross-platform bug search. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates.

2. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., and Yin, H. (2016, January 24–28). Scalable graph-based bug search for firmware images. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria.

3. Ding, S.H., Fung, B.C., and Charland, P. (2019, January 19–23). Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA.

4. Bowman, B., and Huang, H.H. (2020, January 7–11). VGRAPH: A robust vulnerable code clone detection system using code property triplets. Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy.

5. Golubev, Y., Poletansky, V., Povarov, N., and Bryksin, T. (2021, January 9–12). Multi-threshold token-based code clone detection. Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Colorize at Will: Harnessing Diffusion Prior for Image Colorization;IEEE Access;2024

2. High-Resolution Metalens Imaging with Sequential Artificial Intelligence Models;Nano Letters;2023-11-08

3. SimCoDe-NET: Similarity Detection in Binary Code using Deep Learning Network;International Journal of Electrical and Electronics Research;2023-03-20