F-LSTM: FPGA-Based Heterogeneous Computing Framework for Deploying LSTM-Based Algorithms-Reference-Cited by-同舟云学术

F-LSTM: FPGA-Based Heterogeneous Computing Framework for Deploying LSTM-Based Algorithms

Published:2023-02-26 Issue:5 Volume:12 Page:1139
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Liang Bushun¹^ORCID,Wang Siye¹^ORCID,Huang Yeqin¹,Liu Yiling¹,Ma Linpeng¹

Affiliation:

1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

Long Short-Term Memory (LSTM) networks have been widely used to solve sequence modeling problems. For researchers, using LSTM networks as the core and combining it with pre-processing and post-processing to build complete algorithms is a general solution for solving sequence problems. As an ideal hardware platform for LSTM network inference, Field Programmable Gate Array (FPGA) with low power consumption and low latency characteristics can accelerate the execution of algorithms. However, implementing LSTM networks on FPGA requires specialized hardware and software knowledge and optimization skills, which is a challenge for researchers. To reduce the difficulty of deploying LSTM networks on FPGAs, we propose F-LSTM, an FPGA-based framework for heterogeneous computing. With the help of F-LSTM, researchers can quickly deploy LSTM-based algorithms to heterogeneous computing platforms. FPGA in the platform will automatically take up the computation of the LSTM network in the algorithm. At the same time, the CPU will perform the pre-processing and post-processing in the algorithm. To better design the algorithm, compress the model, and deploy the algorithm, we also propose a framework based on F-LSTM. The framework also integrates Pytorch to increase usability. Experimental results on sentiment analysis tasks show that deploying algorithms to the F-LSTM hardware platform can achieve a 1.8× performance improvement and a 5.4× energy efficiency improvement compared to GPU. Experimental results also validate the need to build heterogeneous computing systems. In conclusion, our work reduces the difficulty of deploying LSTM on FPGAs while guaranteeing algorithm performance compared to traditional work.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/5/1139/pdf

Reference31 articles.

1. Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv.

2. Long short-term memory;Hochreiter;Neural Comput.,1997

3. Li, D., and Qian, J. (2016, January 13–15). Text sentiment analysis based on long short-term memory. Proceedings of the 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI), Wuhan, China.

4. Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S., and Wang, Y. (2017, January 22–24). Ese: Efficient speech recognition engine with sparse lstm on fpga. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA.

5. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8–14). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Water quality ensemble prediction model for the urban water reservoir based on the hybrid long short-term memory (LSTM) network analysis;AQUA — Water Infrastructure, Ecosystems and Society;2024-07-15

2. Heterogeneous Acceleration System for Scene Text Recognition;Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering;2023-11-17

3. Research on the Application and Performance Optimization of GPU Parallel Computing in Concrete Temperature Control Simulation;Buildings;2023-10-21

4. Power and Delay-Efficient Matrix Vector Multiplier Units for the LSTM Networks Using Activity Span Reduction Technique and Recursive Adders;Circuits, Systems, and Signal Processing;2023-07-21

5. Improved GWO and its application in parameter optimization of Elman neural network;PLOS ONE;2023-07-07