FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis-Reference-Cited by-同舟云学术

FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis

Published:2023-08-04 Issue:10 Volume:95 Page:1245-1263
ISSN:1939-8018
Container-title:Journal of Signal Processing Systems
language:en
Short-container-title:J Sign Process Syst

Author:

Sestito Cristian^ORCID,Perri Stefania^ORCID,Stewart Robert^ORCID

Abstract

AbstractDeep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using Transposed Convolutions (TCONVs) for image decompression and image generation. Such state-of-the-art DL models often target GPU-based high-performance architectures, due to the high computational and hardware resource needs of TCONV layers. To avoid prohibitive GPU energy costs, CNNs are increasingly deployed to decentralized embedded autonomous devices, such as Field Programmable Gate Arrays (FPGAs). However, this poses challenges for designing efficient hardware implementations of TCONV layers. This paper presents a parameterized design and implementation of a new TCONV module, which is synthesizable onto FPGAs. It is implemented using the High-Level Synthesis (HLS), through a C++ template to parameterize its functional and non-functional properties. These parameters allow kernel sizes, image sizes, quantization and parallelism to be varied by users. With a systematic exploration in this design space, we find an optimal instance of this TCONV module that achieves 6.25 Giga Outputs per Second (Gout/s) using just 1.53 W of power. We then use our TCONV layer in two neural networks for image decompression and image generation. Image decompression achieves a speed throughput of more than 30K frames-per-second (fps) using only the 16% of resources on average, image generation achieves an energy efficiency of 324 fps/W and outperforms comparable state-of-the-art models by at least 7.3×.

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Modeling and Simulation,Information Systems,Signal Processing,Theoretical Computer Science,Control and Systems Engineering

Link

https://link.springer.com/content/pdf/10.1007/s11265-023-01883-7.pdf

Reference34 articles.

1. Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep Learning For Computer Vision: A Brief Review. Computational Intelligence and Neuroscience, 2018, 1–13. https://doi.org/10.1155/2018/7068349

2. Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880

3. Wang, Z., & Majewicz Fey, A. (2018). Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. International Journal of Computer Assisted Radiology and Surgery, 13(12), 1959–1970. https://doi.org/10.1007/s11548-018-1860-1

4. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/MSP.2017.2765202

5. Kumar, M., & Sharma, H. K. (2023). A GAN-Based Model of Deepfake Detection in Social Media. Procedia Computer Science, 218, 2153–2162. https://doi.org/10.1016/j.procs.2023.01.191

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardware Implementation of Calibration Data Loading in Device Driver for an SPI Peripheral;Proceedings of the 2024 6th International Electronics Communication Conference;2024-07-19