Function-Level Compilation Provenance Identification with Multi-Faceted Neural Feature Distillation and Fusion

Author:

Gao Yang1,Liang Lunjin1,Li Yifei1,Li Rui2,Wang Yu2

Affiliation:

1. Department of Computer Science and Technology, Heilongjiang University, Harbin 150080, China

2. School of Continuing Education, Xi’an Jiaotong University, Xi’an 710049, China

Abstract

In the landscape of software development, the selection of compilation tools and settings plays a pivotal role in the creation of executable binaries. This diversity, while beneficial, introduces significant challenges for reverse engineers and security analysts in deciphering the compilation provenance of binary code. To this end, we present MulCPI, short for Multi-representation Fusion-based Compilation Provenance Identification, which integrates the features collected from multiple distinct intermediate representations of the binary code for better discernment of the fine-grained function-level compilation details. In particular, we devise a novel graph-oriented neural encoder improved upon the gated graph neural network by subtly introducing an attention mechanism into the neighborhood nodes’ information aggregation computation, in order to better distill the more informative features from the attributed control flow graph. By further integrating the features collected from the normalized assembly sequence with an advanced Transformer encoder, MulCPI is capable of capturing a more comprehensive set of features manifesting the multi-faceted lexical, syntactic, and structural insights of the binary code. Extensive evaluation on a public dataset comprising 854,858 unique functions demonstrates that MulCPI exceeds the performance of current leading methods in identifying the compiler family, optimization level, compiler version, and the combination of compilation settings. It achieves average accuracy rates of 99.3%, 96.4%, 90.7%, and 85.3% on these tasks, respectively. Additionally, an ablation study highlights the significance of MulCPI’s core designs, validating the efficiency of the proposed attention-enhanced gated graph neural network encoder and the advantages of incorporating multiple code representations.

Funder

the Key Research Project of Shaanxi Higher Education 594 Teaching Reform

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3