Optimal pipelining in supercomputers

Author:

Kunkel S. R.1,Smith J. E.1

Affiliation:

1. Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin

Abstract

This paper examines the relationship between the degree of central processor pipelining and performance. This relationship is studied in the context of modern supercomputers. Limitations due to instruction dependencies are studied via simulations of the CRAY-1S. Both scalar and vector code are studied. This study shows that instruction dependencies severely limit performance for scalar code as well as overall performance. The effects of latch overhead are then considered. The primary cause of latch overhead is the difference between maximum and minimum gate propagation delays. This causes both the skewing of data as it passes along the data path, and unintentional clock skewing due to clock fanout logic. Latch overhead is studied analytically in order to lower bound the clock period that may be used in a pipelined system. This analysis also touches on other points related to latch clocking. This analysis shows that for short pipeline segments both the Earle latch and polarity hold latch give the same clock period bound for both single-phase and multi-phase clocks. Overhead due to data skew and unintentional clock skew are each added to the CRAY-1S simulation model. Simulation results with realistic assumptions show that eight to ten gate levels per pipeline segment lead to optimal overall performance. The results also show that for short pipeline segments data skew and clock skew contribute about equally to the degradation in performance.

Publisher

Association for Computing Machinery (ACM)

Reference21 articles.

1. The System/360 Model 91;Anderson D. W.;Machine Philosophy and Instruction Handling," IBM Journal,1967

2. Latched Carry-Save Adder;Earle J. G.;IBM Technical Disclosure Bull.,1965

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Reflections on Computer Pipeline Technology from an Analytical Perspective;2021 4th International Conference on Computer Science and Software Engineering (CSSE 2021);2021-10-22

2. An energy-delay product study on chip multi-processors for variable stage pipelining;Human-centric Computing and Information Sciences;2015-09-21

3. A comparative simulation study on the power–performance of multi-core architecture;The Journal of Supercomputing;2014-07-25

4. Instruction-level parallel processing: History, overview, and perspective;The Journal of Supercomputing;1993-05

5. Speedup and optimality in pipeline programs;International Journal of Parallel Programming;1989-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3