Discerning the dominant out-of-order performance advantage-Reference-Cited by-同舟云学术

Discerning the dominant out-of-order performance advantage

Published:2013-04-23 Issue:4 Volume:48 Page:241-252
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

McFarlin Daniel S.¹,Tucker Charles²,Zilles Craig²

Affiliation:

1. CMU, Pittburgh, PA, USA

2. UIUC, Urbana, IL, USA

Abstract

In this paper, we set out to study the performance advantages of an Out-of-Order (OOO) processor relative to in-order processors with similar execution resources. In particular, we try to tease apart the performance contributions from two sources: the improved sched- ules enabled by OOO hardware speculation support and its ability to generate different schedules on different occurrences of the same instructions based on operand and functional unit availability. We find that the ability to express good static schedules achieves the bulk of the speedup resulting from OOO. Specifically, of the 53% speedup achieved by OOO relative to a similarly provisioned in- order machine, we find that 88% of that speedup can be achieved by using a single "best" static schedule as suggested by observing an OOO schedule of the code. We discuss the ISA mechanisms that would be required to express these static schedules. Furthermore, we find that the benefits of dynamism largely come from two kinds of events that influence the application's critical path: load instructions that miss in the cache only part of the time and branch mispredictions. We find that much of the benefit of OOO dynamism can be achieved by the potentially simpler task of addressing these two behaviors directly.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2499368.2451143

Reference50 articles.

1. Critical path optimization--unload hard extended scalar block;Babaian B. A.;USPTO,2001

2. Beating in-order stalls with "flea-flicker" two-pass pipelining

3. Runahead execution vs. conventional data prefetching in the IBM POWER6 microprocessor

4. L. Carter W. Chuang and B. Calder . An epic processor with pending functional units. In H. Zima K. Joe M. Sato Y. Seo and M. Shimasaki editors High Performance Computing volume 2327 of Lecture Notes in Computer Science pages 445 -- 448 . Springer Berlin / Heidelberg 2006 . 10.1007/3-540-47847-7_27 L. Carter W. Chuang and B. Calder. An epic processor with pending functional units. In H. Zima K. Joe M. Sato Y. Seo and M. Shimasaki editors High Performance Computing volume 2327 of Lecture Notes in Computer Science pages 445--448. Springer Berlin / Heidelberg 2006. 10.1007/3-540-47847-7_27

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HidFix: Efficient Mitigation of Cache-Based Spectre Attacks Through Hidden Rollbacks;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28

2. Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

3. Efficient Instruction Scheduling Using Real-time Load Delay Tracking;ACM Transactions on Computer Systems;2022-11-24

4. Reconstructing Out-of-Order Issue Queue;2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO);2022-10

5. Specially-Designed Out-of-Order Processor Architecture for Microcontrollers;Electronics;2022-09-21