Affiliation:
1. The Pennsylvania State University
Abstract
Previous proposals for implementing instruction-level temporalredundancy in out-of-order cores have reported a performancedegradation of upto 45% in certain applications compared to anexecution which does not have any temporal redundancy. An importantcontributor to this problem is the insufficient number ofALUs for handling the amplified load injected into the core. At thesame time, increasing the number of ALUs can increase the complexityof the issue logic, which has been pointed out to be oneof the most timing critical components of the processor. This paperproposes a novel extension of a prior idea on instruction reuseto ease ALU bandwidth requirements in a complexity-effective wayby exploiting certain interesting properties of a dual (temporallyredundant) instruction stream. We present microarchitectural extensionsnecessary for implementing an instruction reuse buffer(IRB) and integrating this with the issue logic of a dual instructionstream superscalar core, and conduct extensive evaluationsto demonstrate how well it can alleviate the ALU bandwidth problem.We show that on the average we can gain back nearly 50%of the IPC loss that occurred due to ALU bandwidth limitationsfor an instruction-level temporally redundant superscalar execution,and 23% of the overall IPC loss.
Publisher
Association for Computing Machinery (ACM)
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献