Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Author:

Korol Guilherme1,Jordan Michael Guilherme1,Rutzig Mateus Beck2,Beck Antonio Carlos Schneider1

Affiliation:

1. Institute of Informatics - Federal University of Rio Grande do Sul, Porto Alegre, Brazil

2. Electronics and Computing Department - Federal University of Santa Maria, Santa Maria, Brazil

Abstract

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

Funder

CAPES - Brasil - Finance Code

FAPERGS and CNPq

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference48 articles.

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Design Space Exploration for CNN Offloading to FPGAs at the Edge;2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI);2023-06-20

2. Dynamic Offloading for Improved Performance and Energy Efficiency in Heterogeneous IoT-Edge-Cloud Continuum;2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI);2023-06-20

3. Pruning and Early-Exit Co-Optimization for CNN Acceleration on FPGAs;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04

4. A Comprehensive Evaluation of Convolutional Hardware Accelerators;IEEE Transactions on Circuits and Systems II: Express Briefs;2023-03

5. Adaptive Inference for FPGA-Based 5G Automatic Modulation Classification;Design and Architecture for Signal and Image Processing;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3