Affiliation:
1. Carnegie Mellon University
2. University of Illinois at Urbana-Champaign
Abstract
While control speculation is highly effective for generating good schedules in out-of-order processors, it is less effective for in-order processors because compilers have trouble scheduling in the presence of unbiased branches, even when those branches are highly predictable. In this paper, we demonstrate a novel architectural branch decomposition that separates the prediction and deconvergence point of a branch from its resolution, which enables the compiler to profitably schedule across predictable, but unbiased branches. We show that the hardware support for this branch architecture is a trivial extension of existing systems and describe a simple code transformation for exploiting this architectural support. As architectural changes are required, this technique is most compelling for a dynamic binary translation-based system like Project Denver.
We evaluate the performance improvements enabled by this transformation for several in-order configurations across the SPEC 2006 benchmark suites. We show that our technique produces a Geomean speedup of 11% for SPEC 2006 Integer, with speedups as large as 35%. As floating point benchmarks contain fewer unbiased, but predictable branches, our Geomean speedup on SPEC 2006 FP is 7%, with a maximum speedup of 26%.
Publisher
Association for Computing Machinery (ACM)
Reference43 articles.
1. Conversion of control dependence to data dependence
2. Runahead execution vs. conventional data prefetching in the ibm power6 microprocessor;Cain H. W.;ISPASS,2010
3. M. Charney "Intel software development emulator." {Online}. Available: https://software.intel.com/en-us/articles/pintool M. Charney "Intel software development emulator." {Online}. Available: https://software.intel.com/en-us/articles/pintool