Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge-Reference-Cited by-同舟云学术

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Published:2021-10-31 Issue:5s Volume:20 Page:1-26
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Korol Guilherme¹,Jordan Michael Guilherme¹,Rutzig Mateus Beck²,Beck Antonio Carlos Schneider¹

Affiliation:

1. Institute of Informatics - Federal University of Rio Grande do Sul, Porto Alegre, Brazil

2. Electronics and Computing Department - Federal University of Santa Maria, Santa Maria, Brazil

Abstract

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

Funder

CAPES - Brasil - Finance Code

FAPERGS and CNPq

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3476990

Reference48 articles.

1. The gem5 simulator

2. FINN- R

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design Space Exploration for CNN Offloading to FPGAs at the Edge;2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI);2023-06-20

2. Dynamic Offloading for Improved Performance and Energy Efficiency in Heterogeneous IoT-Edge-Cloud Continuum;2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI);2023-06-20

3. Pruning and Early-Exit Co-Optimization for CNN Acceleration on FPGAs;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04

4. A Comprehensive Evaluation of Convolutional Hardware Accelerators;IEEE Transactions on Circuits and Systems II: Express Briefs;2023-03

5. Adaptive Inference for FPGA-Based 5G Automatic Modulation Classification;Design and Architecture for Signal and Image Processing;2023