Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning-Reference-Cited by-同舟云学术

Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Published:2022-06-30 Issue:2 Volume:15 Page:1-34
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Alonso Tobias¹,Petrica Lucian²,Ruiz Mario³,Petri-Koenig Jakoba⁴,Umuroglu Yaman²,Stamelos Ioannis⁵,Koromilas Elias⁵,Blott Michaela²,Vissers Kees⁶

Affiliation:

1. Universidad Autónoma de Madrid, Madrid, Spain

2. Xilinx Research, Dublin, Ireland

3. Xilinx University Program, Dublin, Ireland

4. Delft University of Technology, Delft, Netherlands

5. InAccel, US

6. Xilinx Research, San José, US

Abstract

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3470567

Reference66 articles.

1. Amazon AWS. 2018. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.

2. A Fully Portable TCP Implementation Using XFSMs

3. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks;Blott Michaela;ACM Trans. Reconfig. Technol. Syst.,2018

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accelerating Distributed Training With Collaborative In-Network Aggregation;IEEE/ACM Transactions on Networking;2024-08

2. SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

3. PipeFuser: Building Flexible Pipeline Architecture for DNN Accelerators via Layer Fusion;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22

4. Extending Data Flow Architectures for Convolutional Neural Networks to Multiple FPGAs;2023 International Conference on Field Programmable Technology (ICFPT);2023-12-12

5. Machine Learning Across Network-Connected FPGAs;2023 IEEE High Performance Extreme Computing Conference (HPEC);2023-09-25