AKGF: Automatic Kernel Generation for DNN on CPU-FPGA-Reference-Cited by-同舟云学术

AKGF: Automatic Kernel Generation for DNN on CPU-FPGA

Published:2023-08-11 Issue: Volume: Page:
ISSN:0010-4620
Container-title:The Computer Journal
language:en
Short-container-title:

Author:

Dong Dong¹,Jiang Hongxu¹,Diao Boyu²

Affiliation:

1. Beijing Key Laboratory of Digital Media, State Key Lab Virtual Real Technology and Systems, Beihang University , Beijing, 100191, China

2. Institute of Computing Technology, Chinese Academy of Sciences , Beijing, 100086, China

Abstract

Abstract While tensor accelerated compilers have proven effective in deploying deep neural networks (DNN) on general-purpose hardware, optimizing for FPGA remains challenging due to the complex DNN architectures and the heterogeneous, semi-open compute units. This paper introduces the Automatic Kernel Generation for DNN on CPU-FPGA (AKGF) framework for efficient deployment of DNN on heterogeneous CPU-FPGA platforms. AKGF generates an intermediate representation (IR) of the DNN using TVM’s Halide IR, annotates the operators of model layers in the IR to compute them on the corresponding hardware cores, and further optimizes the operator code for CPU and FPGA using ARM’s function library and the polyhedral model to enhance model inference speed and power consumption. The experimental tests conducted on a CPU-FPGA board validate the effectiveness of AKGF, demonstrating significant acceleration ratios (up to 6.7x) compared to state-of-the-art accelerators while achieving a 2x power optimization. AKGF effectively leverages the computational capabilities of both CPU and FPGA for high-performance deployment of DNN on CPU-FPGA platforms.

Funder

National Key Research and Development Program of China

Publisher

Oxford University Press (OUP)

Subject

General Computer Science

Link

https://academic.oup.com/comjnl/advance-article-pdf/doi/10.1093/comjnl/bxad086/51176542/bxad086.pdf

Reference25 articles.

1. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines;Ragan-Kelley;ACM Sigplan Notices,2013

2. Simplified high level parallelism expression on heterogeneous systems through data partition pattern description;Wu;Comput. J.,2023

3. A fast precision tuning solution for always-on DNN accelerators;Wang;IEEE Trans. Comput. Aided Des. Integr. Circuits Syst,2022

4. Towards Intelligent Compiler Optimization;Kovac,2022

5. Warping cache simulation of polyhedral programs;Morelli