Leveraging Hardware Message Passing for Efficient Thread Synchronization-Reference-Cited by-同舟云学术

Leveraging Hardware Message Passing for Efficient Thread Synchronization

Published:2016-03-15 Issue:4 Volume:2 Page:1-26
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Petrović Darko¹,Ropars Thomas¹,Schiper André¹

Affiliation:

1. Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland

Abstract

As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ultimately limited by the performance of the underlying cache coherence protocol. This article studies how hardware support for message passing can improve synchronization performance. Considering the ubiquitous problem of mutual exclusion, we devise novel algorithms for (i) classic locking, where application threads obtain exclusive access to a shared resource prior to executing their critical sections (CSes), and (ii) delegation, where CSes are executed by special threads. For classic locking, our HybLock algorithm uses a mix of shared memory and hardware message passing, which introduces the idea of hybrid synchronization algorithms. For delegation, we propose mp-server and HybComb : the former is a straightforward adaptation of the server approach to hardware message passing, whereas the latter is a novel hybrid combining algorithm. Evaluation on Tilera's TILE-Gx processor shows that HybLock outperforms the best known classic locks. Furthermore, mp-server can execute contended CSes with unprecedented throughput, as stalls related to cache coherence are removed from the critical path. HybComb can achieve comparable performance while avoiding the need to dedicate server cores. Consequently, our queue and stack implementations, based on the new synchronization algorithms, largely outperform their most efficient shared-memory-only counterparts.

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2858652

Reference35 articles.

1. GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs

2. Speeding Up OpenMP Tasking

3. The multikernel

4. Many-core key-value store

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mitigating Message Passing Interference in Trusted Embedded Platforms;2023 20th International SoC Design Conference (ISOCC);2023-10-25

2. DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures;International Journal of Parallel Programming;2020-11-20

3. Fast Fine-Grained Global Synchronization on GPUs;Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems;2019-04-04

4. SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures;Lecture Notes in Computer Science;2019