AKGF: Automatic Kernel Generation for DNN on CPU-FPGA

Author:

Dong Dong1,Jiang Hongxu1,Diao Boyu2

Affiliation:

1. Beijing Key Laboratory of Digital Media, State Key Lab Virtual Real Technology and Systems, Beihang University , Beijing, 100191, China

2. Institute of Computing Technology, Chinese Academy of Sciences , Beijing, 100086, China

Abstract

Abstract While tensor accelerated compilers have proven effective in deploying deep neural networks (DNN) on general-purpose hardware, optimizing for FPGA remains challenging due to the complex DNN architectures and the heterogeneous, semi-open compute units. This paper introduces the Automatic Kernel Generation for DNN on CPU-FPGA (AKGF) framework for efficient deployment of DNN on heterogeneous CPU-FPGA platforms. AKGF generates an intermediate representation (IR) of the DNN using TVM’s Halide IR, annotates the operators of model layers in the IR to compute them on the corresponding hardware cores, and further optimizes the operator code for CPU and FPGA using ARM’s function library and the polyhedral model to enhance model inference speed and power consumption. The experimental tests conducted on a CPU-FPGA board validate the effectiveness of AKGF, demonstrating significant acceleration ratios (up to 6.7x) compared to state-of-the-art accelerators while achieving a 2x power optimization. AKGF effectively leverages the computational capabilities of both CPU and FPGA for high-performance deployment of DNN on CPU-FPGA platforms.

Funder

National Key Research and Development Program of China

Publisher

Oxford University Press (OUP)

Subject

General Computer Science

Reference25 articles.

1. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines;Ragan-Kelley;ACM Sigplan Notices,2013

2. Simplified high level parallelism expression on heterogeneous systems through data partition pattern description;Wu;Comput. J.,2023

3. A fast precision tuning solution for always-on DNN accelerators;Wang;IEEE Trans. Comput. Aided Des. Integr. Circuits Syst,2022

4. Towards Intelligent Compiler Optimization;Kovac,2022

5. Warping cache simulation of polyhedral programs;Morelli

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3