Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

Author:

Pellauer Michael1,Parashar Angshuman1,Adler Michael2,Ahsan Bushra2,Allmon Randy2,Crago Neal1,Fleming Kermin2,Gambhir Mohit2,Jaleel Aamer1,Krishna Tushar3,Lustig Daniel4,Maresh Stephen2,Pavlov Vladimir2,Rayess Rachid2,Zhai Antonia5,Emer Joel6

Affiliation:

1. Intel, NVIDIA, Hudson, MA

2. Intel, Hudson, MA

3. Intel, Georgia Institute of Technology, Hudson, MA

4. Princeton University

5. University of Minnesota

6. Intel and MIT, NVIDIA, Hudson, MA

Abstract

There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8 × greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0 ×.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference34 articles.

1. Executing a program on the MIT tagged-token dataflow architecture

2. Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec. Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec.

3. Scaling to the end of silicon with EDGE architectures

4. Theory of latency-insensitive design

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A High-Frequency Load-Store Queue with Speculative Allocations for High-Level Synthesis;2023 International Conference on Field Programmable Technology (ICFPT);2023-12-12

2. Compiler Discovered Dynamic Scheduling of Irregular Code in High-Level Synthesis;2023 33rd International Conference on Field-Programmable Logic and Applications (FPL);2023-09-04

3. LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators;2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2022-04

4. Position Synchronization Control Algorithm of Legged Robot Based on DSP Centralized Control;Mobile Networks and Applications;2022-02-21

5. A Reconfigurable Branch Predictor for Spatial Computing Architectures;Proceedings of the 2020 4th International Conference on Digital Signal Processing;2020-06-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3