Reinforcement Learning-Driven Bit-Width Optimization for the High-Level Synthesis of Transformer Designs on Field-Programmable Gate Arrays-Reference-Cited by-同舟云学术

Reinforcement Learning-Driven Bit-Width Optimization for the High-Level Synthesis of Transformer Designs on Field-Programmable Gate Arrays

Published:2024-01-30 Issue:3 Volume:13 Page:552
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Jang Seojin¹^ORCID,Cho Yongbeom¹²^ORCID

Affiliation:

1. Department of Electrical and Electronics Engineering, Konkuk University, Seoul 05029, Republic of Korea

2. Deep ET, Seoul 05029, Republic of Korea

Abstract

With the rapid development of deep-learning models, especially the widespread adoption of transformer architectures, the demand for efficient hardware accelerators with field-programmable gate arrays (FPGAs) has increased owing to their flexibility and performance advantages. Although high-level synthesis can shorten the hardware design cycle, determining the optimal bit-width for various transformer designs remains challenging. Therefore, this paper proposes a novel technique based on a predesigned transformer hardware architecture tailored for various types of FPGAs. The proposed method leverages a reinforcement learning-driven mechanism to automatically adapt and optimize bit-width settings based on user-provided transformer variants during inference on an FPGA, significantly alleviating the challenges related to bit-width optimization. The effect of bit-width settings on resource utilization and performance across different FPGA types was analyzed. The efficacy of the proposed method was demonstrated by optimizing the bit-width settings for users’ transformer-based model inferences on an FPGA. The use of the predesigned hardware architecture significantly enhanced the performance. Overall, the proposed method enables effective and optimized implementations of user-provided transformer-based models on an FPGA, paving the way for edge FPGA-based deep-learning accelerators while reducing the time and effort typically required in fine-tuning bit-width settings.

Funder

Korea Evaluation Institute of Industrial Technology

IC Design Education Center

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/3/552/pdf

Reference31 articles.

1. He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7–12). Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.

3. TensorFlow (2021, November 05). Effective TensorFlow 2. Available online: https://www.tensorflow.org/guide/effective_tf2.

4. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups;Hinton;IEEE Signal Process. Mag.,2012

5. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA.