Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration
-
Published:2023-04-20
Issue:3
Volume:22
Page:1-23
-
ISSN:1539-9087
-
Container-title:ACM Transactions on Embedded Computing Systems
-
language:en
-
Short-container-title:ACM Trans. Embed. Comput. Syst.
Author:
Zacharopoulos Georgios1ORCID,
Ejjeh Adel2ORCID,
Jing Ying2ORCID,
Yang En-Yu1ORCID,
Jia Tianyu1ORCID,
Brumar Iulian1ORCID,
Intan Jeremy2ORCID,
Huzaifa Muhammad2ORCID,
Adve Sarita2ORCID,
Adve Vikram2ORCID,
Wei Gu-Yeon1ORCID,
Brooks David1ORCID
Affiliation:
1. Harvard University, Cambridge, MA, USA
2. University of Illinois at Urbana-Champaign, Champaign, IL, USA
Abstract
The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present
Trireme
, a fully automated tool-chain that explores multiple levels of parallelism and produces domain-specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms, and Catapult HLS [
7
] was used to synthesize RTL using a commercial 12 nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20×, as well as a speedup of up to 37× for smaller applications, compared to software-only implementations.
Funder
Software Analysis for Heterogeneous Computing Architectures
Swiss National Science Foundation (SNSF), by the National Science Foundation
NSF
DARPA through the Domain-Specific System on Chip
Applications Driving Architectures (ADA) Research Center
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference39 articles.
1. The gem5 simulator
2. Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. In Communications ACM, Vol. 9. 575–577.
3. Early DSE and Automatic Generation of Coarse-grained Merged Accelerators
4. Stratus High-Level Synthesis;Retrieved from https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html,2016
5. Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 217–228.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献