HEBCS: A High-Efficiency Binary Code Search Method
-
Published:2023-08-16
Issue:16
Volume:12
Page:3464
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Sun Xiangjie12, Wei Qiang2, Du Jiang2, Wang Yisen2
Affiliation:
1. School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China 2. School of Cyber Science and Engineering, PLA Information Engineering University, Zhengzhou 450001, China
Abstract
Binary code search is a technique that involves finding code with similarity to a given code within a code database. It finds extensive application in scenarios such as vulnerability queries and code defect analysis. While many existing methods employ advanced machine learning models for similarity analysis, their lack of interpretability and low efficiency in dealing with large-scale functions still remain challenges. To address these issues, we propose a high-efficiency binary code search method called HEBCS. It employs an interpretable approach to extract function-level features and transforms each feature into a locality-sensitive hash representation. Then, the hashes of these features are combined to form the hash of the function. By leveraging the pigeonhole principle, HEBCS enables efficient storage and retrieval of functions, ensuring high execution efficiency even in the presence of large-scale data. Furthermore, we compare HEBCS with a classic method and a state-of-the-art method, demonstrating that HEBCS achieves significantly higher search efficiency while maintaining a comparable accuracy, recall and F1-score. In real-world vulnerability query applications, HEBCS demonstrated promising results. Its effectiveness in large-scale binary function searches suggests significant potential for practical applications.
Funder
National Key R&D Program of China Program for Innovation Leading Scientists and Technicians of ZhongYuan
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference36 articles.
1. Codee: A Tensor Embedding Scheme for Binary Code Search;Yang;IEEE Trans. Softw. Eng.,2022 2. Hu, Y., Zhang, Y., Li, J., Wang, H., Li, B., and Gu, D. (2018, January 23–29). BinMatch: A Semantics-Based Hybrid Approach on Binary Code Clone Analysis. Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), Madrid, Spain. 3. Gao, J., Yang, X., Fu, Y., Jiang, Y., and Sun, J. (2018, January 3–7). VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France. 4. Duan, Y., Li, X., Wang, J., and Yin, H. (2020, January 23–26). DeepBinDiff: Learning Program-Wide Code Representations for Binary Diffing. Proceedings of the Proceedings 2020 Network and Distributed System Security Symposium, San Diego, CA, USA. 5. Whale, G. (1988). Plague: Plagiarism Detection Using Program Structure, University of New South Wales.
|
|