Designing Deep Learning Models on FPGA with Multiple Heterogeneous Engines-Reference-Cited by-同舟云学术

Designing Deep Learning Models on FPGA with Multiple Heterogeneous Engines

Published:2024-01-27 Issue:1 Volume:17 Page:1-30
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Reis Miguel¹^ORCID,Véstias Mário²^ORCID,Neto Horácio¹^ORCID

Affiliation:

1. INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal

2. INESC-ID, ISEL, Instituto Politécnico de Lisboa, Portugal

Abstract

Deep learning models are becoming more complex and heterogeneous with new layer types to improve their accuracy. This brings a considerable challenge to the designers of accelerators of deep neural networks. There have been several architectures and design flows to map deep learning models on hardware, but they are limited to a particular model and/or layer types. Also, the architectures generated by these tools target, in general, high-performance devices, not appropriate for embedded computing. This article proposes a multi-engine architecture and a design flow to implement deep learning models on FPGA. The hardware design uses high-level synthesis to allow design space exploration. The architecture is scalable and therefore applicable to any density FPGAs. The architecture and design flow were applied to the development of a hardware/software system for image classification with ResNet50, object detection with YOLOv3-Tiny, and image segmentation with DeepLabV3+. The system was tested in a low-density Zynq UltraScale+ ZU3EG FPGA to show its scalability. The results show that the proposed multi-engine architecture generates efficient accelerators. An accelerator of ResNet50 with a 4-bit quantization achieves 67 FPS, and the object detector with YOLOv3-Tiny with a throughput of 36 FPS and the image segmentation application achieves 1.4 FPS.

Funder

Fundação para a Ciência e a Tecnologia

Instituto Politécnico de Lisboa

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3615870

Reference47 articles.

1. Kamel Abdelouahab Maxime Pelcat Jocelyn Serot and François Berry. 2018. Accelerating CNN inference on FPGAs: A Survey. (2018). arxiv:cs.DC/1806.01683

2. Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle

3. Accelerating Tiny YOLOv3 using FPGA-Based Hardware/Software Co-Design

4. Fused-layer CNN accelerators

5. Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey