Early DSE and Automatic Generation of Coarse-grained Merged Accelerators-Reference-Cited by-同舟云学术

Early DSE and Automatic Generation of Coarse-grained Merged Accelerators

Published:2023-01-24 Issue:2 Volume:22 Page:1-29
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Brumar Iulian¹^ORCID,Zacharopoulos Georgios¹^ORCID,Yao Yuan¹^ORCID,Rama Saketh¹^ORCID,Brooks David¹^ORCID,Wei Gu-Yeon¹^ORCID

Affiliation:

1. Harvard University, Cambridge, MA, USA

Abstract

Post-Moore’s law area-constrained systems rely on accelerators to deliver performance enhancements. Coarse-grained accelerators can offer substantial domain acceleration, but manual, ad hoc identification of code to accelerate is prohibitively expensive. Because cycle-accurate simulators and high-level synthesis (HLS) flows are so time-consuming, the manual creation of high-utilization accelerators that exploit control and data flow patterns at optimal granularities is rarely successful. To address these challenges, we present AccelMerger, the first automated methodology to create coarse-grained, control- and data-flow-rich merged accelerators. AccelMerger uses sequence alignment matching to recognize similar function call-graphs and loops, and neural networks to quickly evaluate their post-HLS characteristics. It accurately identifies which functions to accelerate, and it merges accelerators to respect an area budget and to accommodate system communication characteristics like latency and bandwidth. Merging two accelerators can save as much as 99% of the area of one. The space saved is used by a globally optimal integer linear program to allocate more accelerators for increased performance. We demonstrate AccelMerger’s effectiveness using HLS flows without any manual effort to fine-tune the resulting designs. On FPGA-based systems, AccelMerger yields application performance improvements of up to 16.7× over software implementations, and 1.91× on average with respect to state-of-the-art early-stage design space exploration tools.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3546070

Reference44 articles.

1. Control flow analysis

2. Area-efficient instruction set synthesis for reconfigurable system-on-chip designs

3. Cadence. 2016. Stratus High-Level Synthesis. https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html.

4. LegUp

5. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS’14).

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automating application-driven customization of ASIPs: A survey;Journal of Systems Architecture;2024-03

2. Designing a Graphics Accelerator with Heterogeneous Architecture;Communications in Computer and Information Science;2024

3. GAHLS: an optimized graph analytics based high level synthesis framework;Scientific Reports;2023-12-19

4. Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration;ACM Transactions on Embedded Computing Systems;2023-04-20

5. ACTION: Automated Hardware-Software Codesign Framework for Low-precision Numerical Format SelecTION in TinyML;Next Generation Arithmetic;2022