Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Author:

Alonso Tobias1,Petrica Lucian2,Ruiz Mario3,Petri-Koenig Jakoba4,Umuroglu Yaman2,Stamelos Ioannis5,Koromilas Elias5,Blott Michaela2,Vissers Kees6

Affiliation:

1. Universidad Autónoma de Madrid, Madrid, Spain

2. Xilinx Research, Dublin, Ireland

3. Xilinx University Program, Dublin, Ireland

4. Delft University of Technology, Delft, Netherlands

5. InAccel, US

6. Xilinx Research, San José, US

Abstract

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference66 articles.

1. Amazon AWS. 2018. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.

2. A Fully Portable TCP Implementation Using XFSMs

3. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks;Blott Michaela;ACM Trans. Reconfig. Technol. Syst.,2018

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Accelerating Distributed Training With Collaborative In-Network Aggregation;IEEE/ACM Transactions on Networking;2024-08

2. SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

3. PipeFuser: Building Flexible Pipeline Architecture for DNN Accelerators via Layer Fusion;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22

4. Extending Data Flow Architectures for Convolutional Neural Networks to Multiple FPGAs;2023 International Conference on Field Programmable Technology (ICFPT);2023-12-12

5. Machine Learning Across Network-Connected FPGAs;2023 IEEE High Performance Extreme Computing Conference (HPEC);2023-09-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3