Affiliation:
1. Ghent University & Universitat Politècnica de Catalunya
2. Ghent University & Vrije Universiteit Brussel
3. Universitat Politècnica de Catalunya, Barcelona, Spain
4. Ghent University, Zwijnaarde, Belgium
Abstract
The end of Dennard scaling leads to new research directions that try to cope with the utilization wall in modern chips, such as the design of specialized architectures. Processor customization utilizes transistors more efficiently, optimizing not only for performance but also for power. However, hardware specialization for each application is costly and impractical due to time-to-market constraints. Domain-specific specialization is an alternative that can increase hardware reutilization across applications that share similar computations. This article explores the specialization of low-power processors with custom instructions (CIs) that run on a specialized functional unit. We are the first, to our knowledge, to design CIs for an application domain
and
across basic blocks, selecting CIs that maximize both performance and energy efficiency improvements.
We present the Merged Instructions Generator for Large Efficiency (MInGLE), an automated framework that identifies and selects CIs. Our framework analyzes large sequences of code (across basic blocks) to maximize acceleration potential while also performing partial matching across applications to optimize for reuse of the specialized hardware. To do this, we convert the code into a new canonical representation, the Merging Diagram, which represents the code’s functionality instead of its structure. This is key to being able to find similarities across such large code sequences from different applications with different coding styles. Groups of potential CIs are clustered depending on their similarity score to effectively reduce the search space. Additionally, we create new CIs that cover not only whole-body loops but also fragments of the code to optimize hardware reutilization further. For a set of 11 applications from the media domain, our framework generates CIs that significantly improve the energy-delay product (EDP) and performance speedup. CIs with the highest utilization opportunities achieve an average EDP improvement of 3.8 × compared to a baseline processor modeled after an Intel Atom. We demonstrate that we can efficiently accelerate a domain with partially matched CIs, and that their design time, from identification to selection, stays within tractable bounds.
Funder
European Research Council under the European Community's Seventh Framework Programme
ERC
Spanish Ministry of Science and Technology
Generalitat de Catalunya
Spanish Government under the Severo Ochoa program
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Automating application-driven customization of ASIPs: A survey;Journal of Systems Architecture;2024-03
2. NOVIA: A Framework for Discovering Non-Conventional Inline Accelerators;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17