Affiliation:
1. Uppsala University, Sweden
2. National University of Singapore, Singapore
3. NTNU, Norway
Abstract
Increasing demands for energy efficiency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy efficiency.
In this paper, we address one of the main performance bottlenecks — last-level cache misses — through a software-hardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating
a non-speculative, execute-ahead paradigm in software
(SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to efficiency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-flow and indirect memory accesses.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference85 articles.
1. Weak ordering---a new definition
2. Alexander Aiken Alexandru Nicolau and Steven Novack. 1995. Resource-Constrained Software Pipelining. IEEE Trans. Parallel Distrib. Syst. 6 12 (1995) 1248ś1270. 10.1109/71.476167 Alexander Aiken Alexandru Nicolau and Steven Novack. 1995. Resource-Constrained Software Pipelining. IEEE Trans. Parallel Distrib. Syst. 6 12 (1995) 1248ś1270. 10.1109/71.476167
3. An Inspector-Executor Algorithm for Irregular Assignment Parallelization
4. ARM. {n. d.}. ARM Cortex-A15 Processor. htp://www.arm.com/ products/processors/cortex-a/cortex-a15.php . ARM. {n. d.}. ARM Cortex-A15 Processor. htp://www.arm.com/ products/processors/cortex-a/cortex-a15.php .
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Criticality Driven Fetch;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17