Future execution-Reference-Cited by-同舟云学术

Future execution

Published:2006-12 Issue:4 Volume:3 Page:424-449
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Ganusov Ilya¹,Burtscher Martin¹

Affiliation:

1. Cornell University, Ithaca, NY

Abstract

This paper describes future execution (FE), a simple hardware-only technique to accelerate individual program threads running on multicore microprocessors. Our approach uses available idle cores to prefetch important data for the threads executing on the active cores. FE is based on the observation that many cache misses are caused by loads that execute repeatedly and whose address-generating program slices do not change (much) between consecutive executions. To exploit this property, FE dynamically creates a prefetching thread for each active core by simply sending a copy of all committed, register-writing instructions to an otherwise idle core. The key innovation is that on the way to the second core, a value predictor replaces each predictable instruction in the prefetching thread with a load immediate instruction, where the immediate is the predicted result that the instruction is likely to produce during its n th next dynamic execution. Executing this modified instruction stream (i.e., the prefetching thread) on another core allows to compute the future results of the instructions that are not directly predictable, issue prefetches into the shared memory hierarchy, and thus reduce the primary threads' memory access time. We demonstrate the viability and effectiveness of future execution by performing cycle-accurate simulations of a two-way CMP running the single-threaded SPECcpu2000 benchmark suite. Our mechanism improves program performance by 12%, on average, over a baseline that already includes an optimized hardware stream prefetcher. We further show that FE is complementary to runahead execution and that the combination of these two techniques raises the average speedup to 20% above the performance of the baseline processor with the aggressive stream prefetcher.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1187976.1187979

Reference33 articles.

1. CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Introduction to data prefetching;Advances in Computers;2022

2. Evaluation of Hardware Data Prefetchers on Server Processors;ACM Computing Surveys;2020-05-31

3. A Primer on Hardware Prefetching;Synthesis Lectures on Computer Architecture;2014-05-31

4. Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction;ACM Transactions on Architecture and Code Optimization;2012-03

5. Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era;ACM Transactions on Architecture and Code Optimization;2012-01