Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration-Reference-Cited by-同舟云学术

Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration

Published:2023-04-20 Issue:3 Volume:22 Page:1-23
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Zacharopoulos Georgios¹^ORCID,Ejjeh Adel²^ORCID,Jing Ying²^ORCID,Yang En-Yu¹^ORCID,Jia Tianyu¹^ORCID,Brumar Iulian¹^ORCID,Intan Jeremy²^ORCID,Huzaifa Muhammad²^ORCID,Adve Sarita²^ORCID,Adve Vikram²^ORCID,Wei Gu-Yeon¹^ORCID,Brooks David¹^ORCID

Affiliation:

1. Harvard University, Cambridge, MA, USA

2. University of Illinois at Urbana-Champaign, Champaign, IL, USA

Abstract

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme , a fully automated tool-chain that explores multiple levels of parallelism and produces domain-specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms, and Catapult HLS [ 7 ] was used to synthesize RTL using a commercial 12 nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20×, as well as a speedup of up to 37× for smaller applications, compared to software-only implementations.

Funder

Software Analysis for Heterogeneous Computing Architectures

Swiss National Science Foundation (SNSF), by the National Science Foundation

NSF

DARPA through the Domain-Specific System on Chip

Applications Driving Architectures (ADA) Research Center

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3580394

Reference39 articles.

1. The gem5 simulator

2. Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. In Communications ACM, Vol. 9. 575–577.

3. Early DSE and Automatic Generation of Coarse-grained Merged Accelerators

4. Stratus High-Level Synthesis;Retrieved from https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html,2016

5. Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 217–228.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automating application-driven customization of ASIPs: A survey;Journal of Systems Architecture;2024-03