E-BATCH: Energy-Efficient and High-Throughput RNN Batching-Reference-Cited by-同舟云学术

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Published:2022-01-23 Issue:1 Volume:19 Page:1-23
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Silfa Franyell¹^ORCID,Arnau Jose Maria¹,González Antonio¹

Affiliation:

1. Universitat Politècnica de Catalunya, Barcelona, Spain

Abstract

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may vastly differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short time span, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of an input sequence is done. Hence, a new input sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. E-BATCH improves throughput by 1.8× and energy efficiency by 3.6× in E-PUR, whereas in TPU, it improves throughput by 2.1× and energy efficiency by 1.6×, over the state-of-the-art.

Funder

CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020

Spanish State Research Agency

ICREA Academia program

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3499757

Reference36 articles.

1. Power Gating with Multiple Sleep Modes

2. Denny Britz Anna Goldie Minh-Thang Luong and Quoc V. Le. 2017. Massive exploration of neural machine translation architectures. CoRR abs/1703.03906 (2017). arXiv:1703.03906 http://arxiv.org/abs/1703.03906

3. Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). arXiv:1512.01274 http://arxiv.org/abs/1512.01274

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Demand Work Order Dispatching Method Based on Power Big Data and RNN;2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT);2023-10-20

2. ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04

3. Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks;ACM Transactions on Reconfigurable Technology and Systems;2022-12-22

4. Complex Theory and Batch Processing in Mechanical Systemic Data Extraction;IEEE Access;2022