HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Author:

Fasfous Nael1,Vemparala Manoj Rohit2,Frickenstein Alexander2,Valpreda Emanuele3,Salihu Driton1,Doan Nguyen Anh Vu1,Unger Christian2,Nagaraja Naveen Shankar2,Martina Maurizio3,Stechele Walter1

Affiliation:

1. Technical University of Munich, Munich, Germany

2. BMW Autonomous Driving, Munich, Germany

3. Politecnico di Torino, Turin, Italy

Abstract

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference38 articles.

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Toward Efficient Co- Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator;2024 2nd International Symposium of Electronics Design Automation (ISEDA);2024-05-10

2. MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the Edge;IEEE Transactions on Circuits and Systems I: Regular Papers;2024-05

3. TEMET: Truncated REconfigurable Multiplier with Error Tuning;Lecture Notes in Electrical Engineering;2024

4. The ZuSE-KI-Mobil AI Accelerator SoC: Overview and a Functional Safety Perspective;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04

5. A 28-nm 50.1-TOPS/W P-8T SRAM Compute-In-Memory Macro Design With BL Charge-Sharing-Based In-SRAM DAC/ADC Operations;IEEE Journal of Solid-State Circuits;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3