Transparent control independence (TCI)-Reference-Cited by-同舟云学术

Transparent control independence (TCI)

Published:2007-06-09 Issue:2 Volume:35 Page:448-459
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Al-Zawawi Ahmed S.¹,Reddy Vimal K.¹,Rotenberg Eric¹,Akkary Haitham H.²

Affiliation:

1. North Carolina State University, Raleigh, NC

2. Intel Corporation, Hilllsboro, OR

Abstract

Superscalar architectures have been proposed that exploit control independence, reducing the performance penalty of branch mispredictions by preserving the work of future misprediction-independent instructions. The essential goal of exploiting control independence is to completely decouple future misprediction-independent instructions from deferred misprediction-dependent instructions. Current implementations fall short of this goal because they explicitly maintain program order among misprediction-independent and misprediction-dependent instructions. Explicit approaches sacrifice design efficiency and ultimately performance. We observe it is sufficient to emulate program order. Potential misprediction-dependent instructions are singled out a priori and their unchanging source values are checkpointed. These instructions and values are set aside as a "recovery program". Checkpointed source values break the data dependencies with co-mingled misprediction-independent instructions - now long since gone from the pipeline - achieving the essential decoupling objective. When the mispredicted branch resolves, recovery is achieved by fetching the self-sufficient, condensed recovery program. Recovery is effectively transparent to the pipeline, in that speculative state is not rolled back and recovery appears as a jump to code. A coarse-grain retirement substrate permits the relaxed order between the decoupled programs. Transparent control independence (TCI) yields a highly streamlined pipeline that quickly recycles resources based on conventional speculation, enabling a large window with small cycle-critical resources, and prevents many mispredictions from disrupting this large window. TCI achieves speedups as high as 64% (16% average) and 88% (22% average) for 4-issue and 8-issue pipelines, respectively, among 15 SPEC integer benchmarks. Factors that limit the performance of explicitly ordered approaches are quantified.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1273440.1250717

Reference26 articles.

1. Multipath execution

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Alternate Path μ-op Cache Prefetching;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. Simulating Wrong-Path Instructions in Decoupled Functional-First Simulation;2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS);2023-04

3. Enabling Branch-Mispredict Level Parallelism by Selectively Flushing Instructions;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17

4. Breaking In-Order Branch Miss Recovery;IEEE Computer Architecture Letters;2020-01-01

5. Simultaneous branch and warp interweaving for sustained GPU performance;ACM SIGARCH Computer Architecture News;2012-09-05