1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Author:

Jia Ang,Fan Ming,Jin Wuxia,Xu Xi,Zhou Zhaohui1,Tang Qiyi,Nie Sen,Wu Shi2,Liu Ting3

Affiliation:

1. Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China

2. Tencent Security Keen Lab, China

3. Ministry of Education Key Lab for Intelligent Networks and Network Security, School of Cyber Science and Engineering, Xi’an Jiaotong University, China

Abstract

Binary similarity analysis is critical to many code-reuse-related issues, where function matching is its fundamental task. “ 1-to-1 ” mechanism has been applied in most binary similarity analysis works, in which one function in a binary file is matched against one function in a source file or binary file. However, we discover that the function mapping is a more complex problem of “ 1-to-n ” (one binary function matches multiple source functions or binary functions) or even “ n-to-n ” (multiple binary functions match multiple binary functions) due to the existence of function inlining , different from traditional understanding. In this paper, we investigate the effect of function inlining on binary similarity analysis. We carry out three studies to investigate the extent of function inlining, the performance of existing works under function inlining, and the effectiveness of existing inlining-simulation strategies. Firstly, a scalable and lightweight identification method is designed to recover function inlining in binaries. 88 projects (compiled in 288 versions and resulting in 32,460,156 binary functions) are collected and analyzed to construct 4 inlining-oriented datasets for 4 security tasks in the software supply chain, including code search, OSS (Open Source Software) reuse detection, vulnerability detection, and patch presence test. Datasets reveal that the proportion of function inlining ranges from 30%-40% when using O3 and sometimes can reach nearly 70%. Then, we evaluate 4 existing works on our dataset. Results show most existing works neglect inlining and use the “1-to-1” mechanism. The mismatches cause a 30% loss in performance during code search and a 40% loss during vulnerability detection. Moreover, most inlined functions would be ignored during OSS reuse detection and patch presence test, thus leaving these functions risky. Finally, we analyze 2 inlining-simulation strategies on our dataset. It is shown that they miss nearly 40% of the inlined functions, and there is still a large space for promotion. By precisely recovering when function inlining happens, we discover that inlining is usually cumulative when optimization increases. Thus, conditional inlining and incremental inlining are recommended to design a low-cost and high-coverage inlining-simulation strategy.

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Reference69 articles.

1. 2009. Cisco settles FSF GPL lawsuit. (2009). http://arstechnica.com/information-technology/2009/05/cisco-settles-fsf-gpl-lawsuit-appoints-compliance-officer [Online; accessed April 7 2021]. 2009. Cisco settles FSF GPL lawsuit. (2009). http://arstechnica.com/information-technology/2009/05/cisco-settles-fsf-gpl-lawsuit-appoints-compliance-officer [Online; accessed April 7 2021].

2. 2015. VMware sued for failure to comply with Linux license. (2015). https://www.zdnet.com/article/vmware-sued-for-failure-to-comply-with-linuxs-license/ [Online ; accessed April 7, 2021 ]. 2015. VMware sued for failure to comply with Linux license. (2015). https://www.zdnet.com/article/vmware-sued-for-failure-to-comply-with-linuxs-license/ [Online; accessed April 7, 2021].

3. 2020. Reveiws 1 - SciTools . https://news.sophos.com/en-us/2020/04/26/asnarok/. (2020). [Online ; accessed 3- September - 2021 ]. 2020. Reveiws 1 - SciTools. https://news.sophos.com/en-us/2020/04/26/asnarok/. (2020). [Online; accessed 3-September-2021].

4. 2020. What’s up , Emotet? CERT Polska. https://cert.pl/en/posts/2020/02/whats-up-emotet/. ( 2020 ). [Online; accessed 3-September-2021]. 2020. What’s up, Emotet? CERT Polska. https://cert.pl/en/posts/2020/02/whats-up-emotet/. (2020). [Online; accessed 3-September-2021].

5. 2021. Binutils - GNU Project - Free Software Foundation. https://www.gnu.org/software/binutils/. (2021). [Online ; accessed 4- August - 2021 ]. 2021. Binutils - GNU Project - Free Software Foundation. https://www.gnu.org/software/binutils/. (2021). [Online; accessed 4-August-2021].

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Are We There Yet? Filling the Gap Between Binary Similarity Analysis and Binary Software Composition Analysis;2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P);2024-07-08

2. CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques;Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems;2024-06-20

3. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching;Proceedings of the IEEE/ACM 46th International Conference on Software Engineering;2024-04-12

4. Cross-Inlining Binary Function Similarity Detection;Proceedings of the IEEE/ACM 46th International Conference on Software Engineering;2024-04-12

5. ReIFunc: Identifying Recurring Inline Functions in Binary Code;2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER);2024-03-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3