CASCADE

Author:

Wijerathne Dhananjaya1,Li Zhaoying1,Karunarathne Manupa1,Pathania Anuj1,Mitra Tulika1

Affiliation:

1. National University of Singapore, Singapore

Abstract

A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput. We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3× performance benefit and 2.2× performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.

Funder

National Research Foundation Singapore

Huawei International Pte.Ltd.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference45 articles.

1. 2019. MediaBench 2 Benchmark. http://mathstat.slu.edu/ fritts/mediabench/. 2019. MediaBench 2 Benchmark. http://mathstat.slu.edu/ fritts/mediabench/.

2. 2019. PolyLib - A Library of Polyhedral Functions. http://icps.u-strasbg.fr/polylib/. 2019. PolyLib - A Library of Polyhedral Functions. http://icps.u-strasbg.fr/polylib/.

3. 2019. The Polyhedral Benchmark Suite. http://web.cse.ohio-state.edu/∼pouchet.2/software/polybench/. 2019. The Polyhedral Benchmark Suite. http://web.cse.ohio-state.edu/∼pouchet.2/software/polybench/.

4. Alfred V. Aho Monica S. Lam Ravi Sethi and Jeffrey D. Ullman. 2007. Compilers: Principles Techniques and Tools Second Edition. Alfred V. Aho Monica S. Lam Ravi Sethi and Jeffrey D. Ullman. 2007. Compilers: Principles Techniques and Tools Second Edition.

5. A decoupled access-execute architecture for reconfigurable accelerators

Cited by 23 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. CGRA Implementation of HEVC Decoder Using Predictable Context Directed Pattern Matching With Efficient and Flexible Memory Architecture;2024-01-15

2. Flip : Data-centric Edge CGRA Accelerator;ACM Transactions on Design Automation of Electronic Systems;2023-12-18

3. Optimizing Data Availability and Utilization in Deep Learning Accelerator SoCs;2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS);2023-12-04

4. Pipelined CNN Inference on Heterogeneous Multi-processor System-on-Chip;Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing;2023-10-10

5. DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs;2023 60th ACM/IEEE Design Automation Conference (DAC);2023-07-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3