Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures-Reference-Cited by-同舟云学术

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

Published:2015-09-11 Issue:3 Volume:33 Page:1-32
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Pellauer Michael¹,Parashar Angshuman¹,Adler Michael²,Ahsan Bushra²,Allmon Randy²,Crago Neal¹,Fleming Kermin²,Gambhir Mohit²,Jaleel Aamer¹,Krishna Tushar³,Lustig Daniel⁴,Maresh Stephen²,Pavlov Vladimir²,Rayess Rachid²,Zhai Antonia⁵,Emer Joel⁶

Affiliation:

1. Intel, NVIDIA, Hudson, MA

2. Intel, Hudson, MA

3. Intel, Georgia Institute of Technology, Hudson, MA

4. Princeton University

5. University of Minnesota

6. Intel and MIT, NVIDIA, Hudson, MA

Abstract

There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8 × greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0 ×.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2754930

Reference34 articles.

1. Executing a program on the MIT tagged-token dataflow architecture

2. Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec. Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec.

3. Scaling to the end of silicon with EDGE architectures

4. Theory of latency-insensitive design

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A High-Frequency Load-Store Queue with Speculative Allocations for High-Level Synthesis;2023 International Conference on Field Programmable Technology (ICFPT);2023-12-12

2. Compiler Discovered Dynamic Scheduling of Irregular Code in High-Level Synthesis;2023 33rd International Conference on Field-Programmable Logic and Applications (FPL);2023-09-04

3. LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators;2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2022-04

4. Position Synchronization Control Algorithm of Legged Robot Based on DSP Centralized Control;Mobile Networks and Applications;2022-02-21

5. A Reconfigurable Branch Predictor for Spatial Computing Architectures;Proceedings of the 2020 4th International Conference on Digital Signal Processing;2020-06-19