Compiler and hardware support for reducing the synchronization of speculative threads-Reference-Cited by-同舟云学术

Compiler and hardware support for reducing the synchronization of speculative threads

Published:2008-05 Issue:1 Volume:5 Page:1-33
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Zhai Antonia¹,Steffan J. Gregory²,Colohan Christopher B.³,Mowry Todd C.⁴

Affiliation:

1. University of Minnesota, Minneapolis, MN

2. University of Toronto, Toronto, Canada

3. Google, Ann Arbor, Michigan

4. Carnegie Mellon University, Pittsburgh, Pennsylvania

Abstract

Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1369396.1369399

Reference50 articles.

1. Akkary H. and Driscoll M. 1998. A Dynamic Multithreading Processor. In MICRO-31. Akkary H. and Driscoll M. 1998. A Dynamic Multithreading Processor. In MICRO-31.

2. Improving data-flow analysis with path profiles

3. Parallel programming with Polaris

4. Chang P. P. Warter N. J. Mahlke S. A. Chen W. Y. and Hwu W. W. 1991. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rept. CRHC-91-29 Center for Reliable and High-Performance Computing University of Illinois. Chang P. P. Warter N. J. Mahlke S. A. Chen W. Y. and Hwu W. W. 1991. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rept. CRHC-91-29 Center for Reliable and High-Performance Computing University of Illinois.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On the choice of the best chunk size for the speculative execution of loops;PLOS ONE;2022-05-17

2. Compile-time Automatic Synchronization Insertion and Redundant Synchronization Elimination for GPU Kernels;INT C PAR DISTRIB SY;2016

3. Moody Scheduling for Speculative Parallelization;Lecture Notes in Computer Science;2015

4. HELIX-RC;ACM SIGARCH Computer Architecture News;2014-10-16

5. Access Annotation for Safe Program Parallelization;Lecture Notes in Computer Science;2013