Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores-Reference-Cited by-同舟云学术

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores

Published:2022-03-07 Issue:2 Volume:19 Page:1-28
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Kumar Rakesh¹^ORCID,Alipour Mehdi²,Black-Schaffer David³

Affiliation:

1. Norwegian University of Science and Technology (NTNU), Trondheim, Norway

2. Ericsson Research, Mobilvägen, Lund, Sweden

3. Uppsala University, Uppsala, Sweden

Abstract

Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex and energy-hungry hardware. This work revisits slice-out-of-order (sOoO) cores as an energy-efficient alternative for MLP exploitation. sOoO cores achieve energy efficiency by constructing and executing slices of MLP-generating instructions out-of-order only with respect to the rest of instructions; the slices and the remaining instructions, by themselves, execute in-order. However, we observe that existing sOoO cores miss significant MLP opportunities due to their dependence-oblivious in-order slice execution, which causes dependent slices to frequently block MLP generation. To boost MLP generation, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them from blocking subsequent independent slices and MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway delivers 12% better performance than the state-of-the-art sOoO core and is within 7% of the MLP limits of full OoO execution.

Funder

Knut and Alice Wallenberg Foundation through the Wallenberg Academy Fellows Program

European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program

Research Council of Norway

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3506704

Reference50 articles.

1. Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors

2. FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors

3. Data prefetching by dependence graph precomputation

4. ARM. ARM Cortex-A7 Processor. [n.d.]. Retrieved from ttp://www.arm.com/products/processors/cortex-a/cortex-a7.php.

5. An Evaluation of High-Level Mechanistic Core Models