Optimizing software runtime systems for speculative parallelization-Reference-Cited by-同舟云学术

Optimizing software runtime systems for speculative parallelization

Published:2013-01 Issue:4 Volume:9 Page:1-27
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Yiapanis Paraskevas¹,Rosas-Ham Demian¹,Brown Gavin¹,Luján Mikel¹

Affiliation:

1. University of Manchester, Manchester, UK

Abstract

Thread-Level Speculation (TLS) overcomes limitations intrinsic with conservative compile-time auto-parallelizing tools by extracting parallel threads optimistically and only ensuring absence of data dependence violations at runtime. A significant barrier for adopting TLS (implemented in software) is the overheads associated with maintaining speculative state. Based on previous TLS limit studies, we observe that on future multicore systems we will likely have more cores idle than those which traditional TLS would be able to harness. This implies that a TLS system should focus on optimizing for small number of cores and find efficient ways to take advantage of the idle cores. Furthermore, research on optimistic systems has covered two important implementation design points: eager vs. lazy version management. With this knowledge, we propose new simple and effective techniques to reduce the execution time overheads for both of these design points. This article describes a novel compact version management data structure optimized for space overhead when using a small number of TLS threads. Furthermore, we describe two novel software runtime parallelization systems that utilize this compact data structure. The first software TLS system, MiniTLS, relies on eager memory data management (in-place updates) and, thus, when a misspeculation occurs a rollback process is required. MiniTLS takes advantage of the novel compact version management representation to parallelize the rollback process and is able to recover from misspeculation faster than existing software eager TLS systems. The second one, Lector (Lazy inspECTOR) is based on lazy version management. Since we have idle cores, the question is whether we can create “helper” tasks to determine whether speculation is actually needed without stopping or damaging the speculative execution. In Lector, for each conventional TLS thread running speculatively with lazy version management, there is associated with it a lightweight inspector . The inspector threads execute alongside to verify quickly whether data dependencies will occur. Inspector threads are generated by standard techniques for inspector/executor parallelization. We have applied both TLS systems to seven Java sequential benchmarks, including three benchmarks from SPECjvm2008. Two out of the seven benchmarks exhibit misspeculations. MiniTLS experiments report average speedups of 1.8x for 4 threads increasing close to 7x speedups with 32 threads. Facilitated by our novel compact representation, MiniTLS reduces the space overhead over state-of-the-art software TLS systems between 96% on 2 threads and 40% on 32 threads. The experiments for Lector, report average speedups of 1.7x for 2 threads (that is 1 TLS + 1 Inspector threads) increasing close to 8.2x speedups with 32 threads (16 + 16 threads). Compared to a well established software TLS baseline, Lector performs on average 1.7x faster for 32 threads and in no case ( x TLS + x Inspector threads) Lector delivers worse performance than the baseline TLS with the equivalent number of TLS threads (i.e. x TLS threads) nor doubling the equivalent number of TLS threads (i.e., x + x TLS threads).

Funder

Engineering and Physical Sciences Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2400682.2400698

Reference39 articles.

1. Ali-Reza Adl-Tabatabai Shpeisman T. and Gottschlich J. 2012. Draft specification of transactional language constructs for C++. Tech. rep. Transactional Memory Specification Drafting Group. Ali-Reza Adl-Tabatabai Shpeisman T. and Gottschlich J. 2012. Draft specification of transactional language constructs for C++. Tech. rep. Transactional Memory Specification Drafting Group.

2. The Jrpm system for dynamically parallelizing Java programs

3. Copy or Discard execution model for speculative parallelization on multicores

4. Design space exploration of a software speculative parallelization scheme

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Effects of Global and Local Network Structure on Number of Driver Nodes in Complex Networks;Lecture Notes in Social Networks;2023

2. Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory Splicing;Parallel Computing;2022-09

3. A survey of community detection methods in multilayer networks;Data Mining and Knowledge Discovery;2020-10-13

4. GbA: A graph-based thread partition approach in speculative multithreading;Concurrency and Computation: Practice and Experience;2017-10-04

5. Just-in-Time Compilation-Inspired Methodology for Parallelization of Compute Intensive Java Code;Mehran University Research Journal of Engineering and Technology;2017-01-01