Affiliation:
1. Institute of Information Science, Academia Sinica, Taiwan
2. Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
Abstract
Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution performance. Moreover, the overall performance is very sensitive to the formation overhead, because region formation can have a non-trivial cost. For addressing the dual issues of region quality and region formation overhead, this article presents a lightweight region formation method guided by processor tracing, e.g., Intel PT. We leverage the branch history information stored in the processor to reconstruct the program execution profile and effectively form high-quality regions with low cost. Furthermore, we present the designs of lightweight hardware performance monitoring sampling and the branch instruction decode cache to minimize region formation overhead. Using ARM64 to x86-64 translations, the experiment results show that our method achieves a performance speedup of up to 1.53× (1.16× on average) for SPEC CPU2006 benchmarks with reference inputs, compared to the well-known software-based trace formation method, Next Executing Tail (NET). The performance results of x86-64 to ARM64 translations also show a speedup of up to 1.25× over NET for CINT2006 benchmarks with reference inputs. The comparison with a relaxed NETPlus region formation method further demonstrates that our method achieves the best performance and lowest compilation overhead.
Funder
Ministry of Science and Technology of Taiwan
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference41 articles.
1. The Jalapeño virtual machine
2. ARM. 2012. CoreSight Components Technical Reference Manual. ARM. ARM. 2012. CoreSight Components Technical Reference Manual. ARM.
3. A framework for reducing the cost of instrumented code
4. Dynamo
5. Optimally profiling and tracing programs