Affiliation:
1. University of Edinburgh, Edinburgh, United Kingdom
2. University of St. Andrews, St. Andrews, United Kingdom
Abstract
Read-Modify-Write (RMW) instructions are widely used as the building blocks of a variety of higher level synchronization constructs, including locks, barriers, and lock-free data structures. Unfortunately, they are expensive in architectures such as x86 and SPARC which enforce (variants of) Total-Store-Order (TSO). A key reason is that RMWs in these architectures are ordered like a memory barrier, incurring the cost of a write-buffer drain in the critical path. Such strong ordering semantics are dictated by the requirements of the strict atomicity definition (type-1) that existing TSO RMWs use. Programmers often do not need such strong semantics. Besides, weakening the atomicity definition of TSO RMWs, would also weaken their ordering -- thereby leading to more efficient hardware implementations.
In this paper we argue for TSO RMWs to use weaker atomicity definitions -- we consider two weaker definitions: type-2 and type-3, with different relaxed ordering differences. We formally specify how such weaker RMWs would be ordered, and show that type-2 RMWs, in particular, can seamlessly replace existing type-1 RMWs in common synchronization idioms -- except in situations where a type-1 RMW is used as a memory barrier. Recent work has shown that the new C/C++11 concurrency model can be realized by generating conventional (type-1) RMWs for C/C++11 SC-atomic-writes and/or SC-atomic-reads. We formally prove that this is equally valid using the proposed type-2 RMWs; type-3 RMWs, on the other hand, could be used for SC-atomic-reads (and optionally SC-atomic-writes). We further propose efficient microarchitectural implementations for type-2 (type-3) RMWs -- simulation results show that our implementation reduces the cost of an RMW by up to 58.9% (64.3%), which translates into an overall performance improvement of up to 9.0% (9.2%) on a set of parallel programs, including those from the SPLASH-2, PARSEC, and STAMP benchmarks.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. NextGen-Malloc: Giving Memory Allocator Its Own Room in the House;Proceedings of the 19th Workshop on Hot Topics in Operating Systems;2023-06-22
2. On Consistency for Bulk-Bitwise Processing-in-Memory;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02
3. IXIAM: ISA EXtension for Integrated Accelerator Management;IEEE Access;2023
4. Free atomics;Proceedings of the 49th Annual International Symposium on Computer Architecture;2022-06-11
5. Taming release-acquire consistency;ACM SIGPLAN Notices;2016-04-08