Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains

Author:

Nayak Ankita1ORCID,Zhang Keyi1ORCID,Setaluri Rajsekhar1ORCID,Carsello Alex1ORCID,Mann Makai1ORCID,Torng Christopher1ORCID,Richardson Stephen1ORCID,Bahr Rick1ORCID,Hanrahan Pat1ORCID,Horowitz Mark1ORCID,Raina Priyanka1ORCID

Affiliation:

1. Stanford University, Stanford, CA

Abstract

To effectively minimize static power for a wide range of applications, power domains for coarse-grained reconfigurable array (CGRA) architectures need to be more fine-grained than those found in a typical application-specific integrated circuit. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that reduces the area overhead of power domain boundary protection from around 9% to less than 1% without incurring any extra timing delay from the isolation cells. Conventional Unified Power Format based flow for power domain boundary protection does not support this design choice. Therefore, we create our own compiler-like passes that iteratively introduce the needed design changes, and formally verify the transformations using methods based on satisfiability modulo theories. These passes also let us optimize how we handle test and debug signals through the off tiles in the CGRA. Using our framework, we add power domains to a CGRA that we designed and taped out. The CGRA has 32 × 16 processing element and memory tiles and 4-MB secondary memory. We address the implementation challenges encountered due to the introduction of fine-grained power domains, including the addressing of the CGRA tiles, the power grid design, well substrate connections, and distribution of global signals. Our CGRA achieves up to 83% reduction in leakage power and 26% reduction in total power versus an identical CGRA without multiple power domains, for a range of image processing and machine learning applications.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference50 articles.

1. Andrew Adams. n.d. Halide. Retrieved September 6 2022 from https://github.com/halide/halide.github.com.

2. Altera. 2017. Stratix V Device Handbook. Retrieved September 6 2022 from https://media.digikey.com/pdf/Data%20Sheets/Altera%20PDFs/Stratix_V_Handbook.pdf.

3. A multithreaded CGRA for convolutional neural network processing;Ando Kota;Circuits and Systems,2017

4. Stephen D. Brown, Robert J. Francis, Jonathan Rose, and Zvonko G. Vranesic. 2012. Field-Programmable Gate Arrays. Vol. 180. Springer Science & Business Media, New York, NY.

5. Assem A. M. Bsoul and Steven J. E. Wilton. 2010. An FPGA architecture supporting dynamically controlled power gating. In Proceedings of the 2010 International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 1–8.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Mapping Enumeration for Multi-Context CGRAs Using Zero-Suppressed Binary Decision Diagrams;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

2. An Investigation into the Impact of Using Automated Synthesisable Internal Power-Gating on Improved Power Efficiency for ASICs;2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC);2024-01-29

3. Amber: A 16-nm System-on-Chip With a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra;IEEE Journal of Solid-State Circuits;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3