EOLE

Author:

Perais Arthur1,Seznec André1

Affiliation:

1. IRISA/INRIA

Abstract

Even in the multicore era, there is a continuous demand to increase the performance of single-threaded applications. However, the conventional path of increasing both issue width and instruction window size inevitably leads to the power wall. Value prediction (VP) was proposed in the mid 90's as an alternative path to further enhance the performance of wide-issue superscalar processors. Still, it was considered up to recently that a performance-effective implementation of Value Prediction would add tremendous complexity and power consumption in almost every stage of the pipeline Nonetheless, recent work in the field of VP has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, recovering from mispredictions via selective replay can be avoided and a much simpler mechanism -- pipeline squashing -- can be used, while the out-of-order engine remains mostly unmodified. Yet, VP and validation at commit time entails strong constraints on the Physical Register File. Write ports are needed to write predicted results and read ports are needed in order to validate them at commit time, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time Consequently, a significant number of instructions -- 10% to 60% in our experiments -- can bypass the out-of-order engine, allowing the reduction of the issue width, which is a major contributor to both out-of-order engine complexity and register file port requirement. This reduction paves the way for a truly practical implementation of Value Prediction. Furthermore, since Value Prediction in itself usually increases performance, our resulting {Early | Out-of-Order | Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine

Funder

European Research Council

Publisher

Association for Computing Machinery (ACM)

Reference40 articles.

1. P. Ahuja D. Clark and A. Rogers "The performance impact of incomplete bypassing in processor pipelines " in the International Symposium on Microarchitecture 1995. P. Ahuja D. Clark and A. Rogers "The performance impact of incomplete bypassing in processor pipelines " in the International Symposium on Microarchitecture 1995.

2. T. M. Austin "DIVA: a reliable substrate for deep submicron microArchitecture design " in the International Symposium on Microarchitecture 1999. T. M. Austin "DIVA: a reliable substrate for deep submicron microArchitecture design " in the International Symposium on Microarchitecture 1999.

3. The gem5 simulator

4. G. Z. Chrysos and J. S. Emer "Memory dependence prediction using store sets " in the International Symposium on Computer Architecture 1998. 10.1145/279358.279378 G. Z. Chrysos and J. S. Emer "Memory dependence prediction using store sets " in the International Symposium on Computer Architecture 1998. 10.1145/279358.279378

5. A load-instruction unit for pipelined processors

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. Cost-Effective Value Predictor for ILP processors through Design Space Exploration;Proceedings of the Great Lakes Symposium on VLSI 2024;2024-06-12

3. Optimizing Value Prediction for Ilp Processors: A Design Space Exploration Approach;2024

4. Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17

5. Early Address Prediction;ACM Transactions on Architecture and Code Optimization;2021-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3