Discerning the dominant out-of-order performance advantage

Author:

McFarlin Daniel S.1,Tucker Charles2,Zilles Craig2

Affiliation:

1. CMU, Pittburgh, PA, USA

2. UIUC, Urbana, IL, USA

Abstract

In this paper, we set out to study the performance advantages of an Out-of-Order (OOO) processor relative to in-order processors with similar execution resources. In particular, we try to tease apart the performance contributions from two sources: the improved sched- ules enabled by OOO hardware speculation support and its ability to generate different schedules on different occurrences of the same instructions based on operand and functional unit availability. We find that the ability to express good static schedules achieves the bulk of the speedup resulting from OOO. Specifically, of the 53% speedup achieved by OOO relative to a similarly provisioned in- order machine, we find that 88% of that speedup can be achieved by using a single "best" static schedule as suggested by observing an OOO schedule of the code. We discuss the ISA mechanisms that would be required to express these static schedules. Furthermore, we find that the benefits of dynamism largely come from two kinds of events that influence the application's critical path: load instructions that miss in the cache only part of the time and branch mispredictions. We find that much of the benefit of OOO dynamism can be achieved by the potentially simpler task of addressing these two behaviors directly.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference50 articles.

1. Critical path optimization--unload hard extended scalar block;Babaian B. A.;USPTO,2001

2. Beating in-order stalls with "flea-flicker" two-pass pipelining

3. Runahead execution vs. conventional data prefetching in the IBM POWER6 microprocessor

4. L. Carter W. Chuang and B. Calder . An epic processor with pending functional units. In H. Zima K. Joe M. Sato Y. Seo and M. Shimasaki editors High Performance Computing volume 2327 of Lecture Notes in Computer Science pages 445 -- 448 . Springer Berlin / Heidelberg 2006 . 10.1007/3-540-47847-7_27 L. Carter W. Chuang and B. Calder. An epic processor with pending functional units. In H. Zima K. Joe M. Sato Y. Seo and M. Shimasaki editors High Performance Computing volume 2327 of Lecture Notes in Computer Science pages 445--448. Springer Berlin / Heidelberg 2006. 10.1007/3-540-47847-7_27

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. HidFix: Efficient Mitigation of Cache-Based Spectre Attacks Through Hidden Rollbacks;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28

2. Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

3. Efficient Instruction Scheduling Using Real-time Load Delay Tracking;ACM Transactions on Computer Systems;2022-11-24

4. Reconstructing Out-of-Order Issue Queue;2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO);2022-10

5. Specially-Designed Out-of-Order Processor Architecture for Microcontrollers;Electronics;2022-09-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3