An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks-Reference-Cited by-同舟云学术

An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks

Published:2021-03-14 Issue:6 Volume:10 Page:681
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

He Dazhong^ORCID,He Junhua,Liu Jun,Yang Jie,Yan Qing,Yang Yang^ORCID

Abstract

Over the past two decades, Long Short-Term Memory (LSTM) networks have been used to solve problems that require modeling of long sequence because they can selectively remember certain patterns over a long period, thus outperforming traditional feed-forward neural networks and Recurrent Neural Network (RNN) on learning long-term dependencies. However, LSTM is characterized by feedback dependence, which limits the high parallelism of general-purpose processors such as CPU and GPU. Besides, in terms of the energy efficiency of data center applications, the high consumption of GPU and CPU computing cannot be ignored. To deal with the above problems, Field Programmable Gate Array (FPGA) is becoming an ideal alternative. FPGA has the characteristics of low power consumption and low latency, which are helpful for the acceleration and optimization of LSTM and other RNNs. This paper proposes an implementation scheme of the LSTM network acceleration engine based on FPGA and further optimizes the implementation through fixed-point arithmetic, systolic array and lookup table for nonlinear function. On this basis, for easy deployment and application, we integrate the proposed acceleration engine into Caffe, one of the most popular deep learning frameworks. Experimental results show that, compared with CPU and GPU, the FPGA-based acceleration engine can achieve performance improvement of 8.8 and 2.2 times and energy efficiency improvement of 16.9 and 9.6 times, respectively, within Caffe framework.

Funder

National Natural Science Foundation of China

Beijing Municipal Natural Science Foundation

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/6/681/pdf

Reference28 articles.

1. Sequence to sequence learning with neural networks;Sutskever;arXiv,2014

2. Long Short-Term Memory

3. Optimizing performance of recurrent neural networks on gpus;Appleyard;arXiv,2016

4. Torch: A Modular Machine Learning Software Library;Collobert,2002

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring energy efficiency of LSTM accelerators: A parameterized architecture design for embedded FPGAs;Journal of Systems Architecture;2024-07

2. Efficient FPGA Implementation of Convolutional Neural Networks and Long Short-Term Memory for Radar Emitter Signal Recognition;Sensors;2024-01-30

3. Artificial Intelligence-Based Field-Programmable Gate Array Accelerator for Electric Vehicles Battery Management System;SAE International Journal of Connected and Automated Vehicles;2024-01-04

4. An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device;IEEE Access;2024

5. Accelerating a Meta Learning Model for Ultrasonic Non-Destructive Testing Applications Using Model Compression and FPGA Hardware;Journal of Signal Processing Systems;2023-11-11