Abstract
Memory consistency models, or memory models, allow both programmers and program language implementers to reason about concurrent accesses to one or more memory locations. Memory model specifications balance the often conflicting needs for precise semantics, implementation flexibility, and ease of understanding. Toward that end, popular programming languages like Java, C, and C++ have adopted memory models built on the conceptual foundation of Sequential Consistency for Data-Race-Free programs (SC for DRF). These SC for DRF languages were created with general-purpose homogeneous CPU systems in mind, and all assume a single, global memory address space. Such a uniform address space is usually power and performance prohibitive in heterogeneous Systems on Chips (SoCs), and for that reason most heterogeneous languages have adopted split address spaces and operations with nonglobal visibility.
There have recently been two attempts to bridge the disconnect between the CPU-centric assumptions of the SC for DRF framework and the realities of heterogeneous SoC architectures. Hower et al. proposed a class of Heterogeneous-Race-Free (HRF) memory models that provide a foundation for understanding many of the issues in heterogeneous memory models. At the same time, the Khronos Group developed the OpenCL 2.0 memory model that builds on the C++ memory model. The OpenCL 2.0 model includes features not addressed by HRF: primarily support for relaxed atomics and a property referred to as scope inclusion. In this article, we generalize HRF to allow formalization of and reasoning about more complicated models using OpenCL 2.0 as a point of reference. With that generalization, we (1) make the OpenCL 2.0 memory model more accessible by introducing a platform for feature comparisons to other models, (2) consider a number of shortcomings in the current OpenCL 2.0 model, and (3) propose changes that could be adopted by future OpenCL 2.0 revisions or by other, related, models.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference17 articles.
1. Data races are evil with no exceptions
2. Weak ordering---a new definition
3. Hans-J. Boehm. 2013. N3710: Specifying the absence of out of thin air results (LWG2265). Hans-J. Boehm. 2013. N3710: Specifying the absence of out of thin air results (LWG2265).
4. Foundations of the C++ concurrency memory model
5. Outlawing ghosts
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Taking Back Control in an Intermediate Representation for GPU Computing;Proceedings of the ACM on Programming Languages;2023-01-09
2. Improving the Scalability of GPU Synchronization Primitives;IEEE Transactions on Parallel and Distributed Systems;2023-01-01
3. Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic Buffers;2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2022-04
4. Using a static naming approach to implement remote scope promotion;Turkish Journal of Electrical Engineering and Computer Sciences;2022-01-01
5. sRSP: An efficient and scalable implementation of remote scope promotion;Concurrency and Computation: Practice and Experience;2021-07-11