SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks-Reference-Cited by-同舟云学术

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks

Published:2023-01-24 Issue:2 Volume:22 Page:1-23
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Yazdani Aminabadi Reza¹^ORCID,Ruwase Olatunji²^ORCID,Zhang Minjia²^ORCID,He Yuxiong²^ORCID,Arnau Jose-Maria³^ORCID,Gonazalez Antonio³^ORCID

Affiliation:

1. Microsoft, Canada

2. Microsoft, USA

3. Universitat Politecnica de Catalunya, Spain

Abstract

The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations. In this work, we identify adaptiveness as a key feature that is missing from today’s RNN accelerators. In particular, we first show the problem of low resource utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA, and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model’s characteristics. Sharp achieves 2×, 2.8×, and 82× speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS/Watt).

Funder

CoCoUnit ERC Advanced

EU’s Horizon 2020

Spanish State Research Agency

ICREA Academia

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3552513

Reference46 articles.

1. Optimizing performance of recurrent neural networks on GPUs;Appleyard Jeremy;CoRR,2016

2. Towards non-saturating recurrent units for modelling long-term dependencies;Chandar Sarath;CoRR,2019

3. Recurrent neural networks hardware implementation on FPGA;Chang Andre Xian Ming;CoRR,2015

4. cuDNN: Efficient primitives for deep learning;Chetlur Sharan;CoRR,2014

5. Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation;Cho Kyunghyun;CoRR,2014

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. KDBI special issue: Explainability feature selection framework application for LSTM multivariate time‐series forecast self optimization;Expert Systems;2024-07-04

2. PeakEngine: A Deterministic On-the-Fly Pruning Neural Network Accelerator for Hearing Instruments;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2024-01

3. AI-Based Epileptic Seizure Detection and Prediction in Internet of Healthcare Things: A Systematic Review;IEEE Access;2023