Dynamically dispatching speculative threads to improve sequential execution-Reference-Cited by-同舟云学术

Dynamically dispatching speculative threads to improve sequential execution

Published:2012-09 Issue:3 Volume:9 Page:1-31
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Luo Yangchun¹,Zhai Antonia²

Affiliation:

1. Advanced Micro Devices, Sunnyvale, CA

2. University of Minnesota, Minneapolis, MN

Abstract

Efficiently utilizing multicore processors to improve their performance potentials demands extracting thread-level parallelism from the applications. Various novel and sophisticated execution models have been proposed to extract thread-level parallelism from sequential programs. One such execution model, Thread-Level Speculation (TLS), allows potentially dependent threads to execute speculatively in parallel. However, TLS execution is inherently unpredictable, and consequently incorrect speculation could degrade performance for the multicore systems. Existing approaches have focused on using the compilers to select sequential program regions to apply TLS. Our research shows that even the state-of-the-art compiler makes suboptimal decisions, due to the unpredictability of TLS execution. Thus, we propose to dynamically optimize TLS performance. This article describes the design, implementation, and evaluation of a runtime thread dispatching mechanism that adjusts the behaviors of speculative threads based on their efficiency. In the proposed system, speculative threads are monitored by hardware-based performance counters and their performance impact is evaluated with a novel methodology that takes into account various unique TLS characteristics. Thread dispatching policies are devised to adjust the behaviors of speculative threads accordingly. With the help of the runtime evaluation, where and how to create speculative threads is better determined. Evaluated with all the SPEC CPU2000 benchmark programs written in C, the dynamic dispatching system outperforms the state-of-the-art compiler-based thread management techniques by 9.4% on average. Comparing to sequential execution, we achieve 1.37X performance improvement on a four-core CMP-based system.

Funder

Division of Computer and Network Systems

National Science Foundation

Semiconductor Research Corporation

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2355585.2355586

Reference49 articles.

1. OpenMP: an industry standard API for shared-memory programming

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. IDaTPA: importance degree based thread partitioning approach in thread level speculation;Discover Computing;2024-06-19

2. A hybrid sample generation approach in speculative multithreading;The Journal of Supercomputing;2017-08-07

3. A Hybrid Samples Generation Approach in Speculative Multithreading;2016

4. The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution;ACM Transactions on Architecture and Code Optimization;2013-12

5. A Novel Thread Partitioning Approach Based on Machine Learning for Speculative Multithreading;2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing;2013-11