Abstract
We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into
tracelets
: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in
over a million
binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.
Funder
Seventh Framework Programme
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference25 articles.
1. A heap based vulnerability in gnu's rtapelib.c. http://www.cvedetails.com/cve/CVE-2010-0624/. A heap based vulnerability in gnu's rtapelib.c. http://www.cvedetails.com/cve/CVE-2010-0624/.
2. Hex-rays IDAPRO. http://www.hex-rays.com. Hex-rays IDAPRO. http://www.hex-rays.com.
3. Yard-plot. http://pypi.python.org/pypi/yard. Yard-plot. http://pypi.python.org/pypi/yard.
Cited by
55 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献