Affiliation:
1. INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
2. INESC-ID, ISEL, Instituto Politécnico de Lisboa, Portugal
Abstract
Deep learning models are becoming more complex and heterogeneous with new layer types to improve their accuracy. This brings a considerable challenge to the designers of accelerators of deep neural networks. There have been several architectures and design flows to map deep learning models on hardware, but they are limited to a particular model and/or layer types. Also, the architectures generated by these tools target, in general, high-performance devices, not appropriate for embedded computing. This article proposes a multi-engine architecture and a design flow to implement deep learning models on FPGA. The hardware design uses high-level synthesis to allow design space exploration. The architecture is scalable and therefore applicable to any density FPGAs. The architecture and design flow were applied to the development of a hardware/software system for image classification with ResNet50, object detection with YOLOv3-Tiny, and image segmentation with DeepLabV3+. The system was tested in a low-density Zynq UltraScale+ ZU3EG FPGA to show its scalability. The results show that the proposed multi-engine architecture generates efficient accelerators. An accelerator of ResNet50 with a 4-bit quantization achieves 67 FPS, and the object detector with YOLOv3-Tiny with a throughput of 36 FPS and the image segmentation application achieves 1.4 FPS.
Funder
Fundação para a Ciência e a Tecnologia
Instituto Politécnico de Lisboa
Publisher
Association for Computing Machinery (ACM)