Affiliation:
1. Univ. of Wisconsin-Madison, Madison
Abstract
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that the interconnect supports broadcast.
The primitives make use of synchronization bits (syncbits) to provide a simple mechanism for mutual exclusion. The proposed implementation of the primitives includes efficient (
i.e.
local) busy-waiting for syncbits. In addition, a hardware-supported mechanism for maintaining a first-come first-serve queue of requests for a syncbit is proposed. This queueing mechanism allows for a very efficient implementation of, as well as fair access to, binary semaphores. We also propose to implement Fetch and Add with combining in software rather than hardware. This allows an architecture to scale to a large number of processors while avoiding the cost of hardware combining.
Scenarios for common synchronization events such as work queues and barriers are presented to demonstrate the generality and ease of use of the proposed primitives. The efficient implementation of the primitives is simpler if the multiprocessor has a hardware cache-consistency protocol. To illustrate this point, we outline how the primitives would be implemented in the Multicube multiprocessor [GoWo88].
Publisher
Association for Computing Machinery (ACM)
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A barrier optimization framework for NUMA multi‐core system;Concurrency and Computation: Practice and Experience;2019-10-21
2. Skeap & Seap;The 31st ACM Symposium on Parallelism in Algorithms and Architectures;2019-06-17
3. Lease/Release;ACM Transactions on Parallel Computing;2017-10-10
4. Lease/release;ACM SIGPLAN Notices;2016-11-09
5. Lease/release;Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2016-02-27