Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Author:

Vajapeyam Sriram1,Mitra Tulika2

Affiliation:

1. Supercomputer Education and Research Centre and Dept. of Computer Science & Automation, Indian Institnte of Science, Bangalore, India 560012

2. Dept. of Computer Science & Automation, Indian Institute of science, Bangalore, India 560012

Abstract

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular, register renaming a large number of instructions per cycle is difficult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the register file into a global file and several local files, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.

Publisher

Association for Computing Machinery (ACM)

Cited by 15 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Pipelined Serial Register Renaming;Communications in Computer and Information Science;2020

2. Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency;IEICE Transactions on Information and Systems;2016

3. Register-Level Communication in Speculative Chip Multiprocessors;Advances in Computers;2014

4. Computer Architecture and Design;Physics and Applications of Negative Refractive Index Materials;2008-09-26

5. Performance scalability of decoupled software pipelining;ACM Transactions on Architecture and Code Optimization;2008-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3