Affiliation:
1. Carnegie Mellon University, Pittsburgh, PA
Abstract
While there have been many recent proposals for hardware that supports
Thread-Level Speculation
(TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the
critical forwarding path
introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference37 articles.
1. Improving data-flow analysis with path profiles
2. BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.]] BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.]]
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Performance Estimation of Task Graphs Based on Path Profiling;International Journal of Parallel Programming;2015-07-23
2. Dynamic Core Allocation for Energy-Efficient Thread-Level Speculation;2014 IEEE 17th International Conference on Computational Science and Engineering;2014-12
3. A Dynamically Adaptive Approach for Speculative Loop Execution in SMT Architectures;2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS);2014-08
4. The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution;ACM Transactions on Architecture and Code Optimization;2013-12
5. Disjoint out-of-order execution processor;ACM Transactions on Architecture and Code Optimization;2012-09