Affiliation:
1. University of Illinois, Urbana, IL, USA
Abstract
This paper considers Rigel, a programmable accelerator architecture for a broad class of data- and task-parallel computation. Rigel comprises 1000+ hierarchically-organized cores that use a fine-grained, dynamically scheduled single-program, multiple-data (SPMD) execution model. Rigel's low-level programming interface adopts a single global address space model where parallel work is expressed in a task-centric, bulk-synchronized manner using minimal hardware support. Compared to existing accelerators, which contain domain-specific hardware, specialized memories, and/or restrictive programming models, Rigel is more flexible and provides a straightforward target for a broader set of applications.
We perform a design analysis of Rigel to quantify the compute density and power efficiency of our initial design. We find that Rigel can achieve a density of over 8 single-precision GFLOPS/mm
2
in 45nm, which is comparable to high-end GPUs scaled to 45nm. We perform experimental analysis on several applications ported to the Rigel low-level programming interface. We examine scalability issues related to work distribution, synchronization, and load-balancing for 1000-core accelerators using software techniques and minimal specialized hardware support. We find that while it is important to support fast task distribution and barrier operations, these operations can be implemented without specialized hardware using flexible hardware primitives.
Publisher
Association for Computing Machinery (ACM)
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Mach-RT: A Many Chip Architecture for High Performance Ray Tracing;IEEE Transactions on Visualization and Computer Graphics;2022-03-01
2. Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache;ACM Transactions on Architecture and Code Optimization;2021-12-31
3. Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters;IEEE Transactions on Parallel and Distributed Systems;2021-03-01
4. Ch’i: Scaling Microkernel Capabilities in Cache-Incoherent Systems;2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS);2020-11
5. Transmuter;Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques;2020-09-30