Author:
Borisov Petr,Kosolapov Yury
Abstract
The paper considers the problem of quantitative comparison of potency and resistance of practically applied obfuscating transformations of program code. A method is proposed to find the potency and resistance of transformations by calculating the «comprehensibility» of the obfuscated and deobfuscated versions of a program, respectively. As a measure of program comprehensibility, it is proposed to use the similarity of this program to the approximation of its «most comprehensible» version. Based on the proposed method a model to assess potency and resistance was built, the main elements of which are: a set of investigated obfuscating transformations, a similarity function, a method to approximate the most comprehensible version of the program and a deobfuscator. To implement this model 1) obfuscating transformations provided by Hikari obfuscator are chosen; 2) 8 similarity functions are constructed by machine learning methods using static characteristics of programs from CoreUtils, PolyBench and HashCat sets; 3) the smallest program version was chosen as an approximation of the most comprehensible program version (found among the versions obtained using optimization options of GCC, Clang and AOCC compilers); 4) a program deobfuscation scheme based on the optimizing compiler from LLVM was built and implemented. The results of the potency and resistance for sequences of transformations of lengths one, two and three were experimentally obtained. These results showed consistency with the results of independent potency and resistance evaluations obtained by other methods. In particular, it was found that the highest potency and resistance are demonstrated by sequences of transformations starting with transformations of the control flow graph, and the lowest resistance and potency are generally demonstrated by sequences that do not contain such transformations.