xDNN: Inference for Deep Convolutional Neural Networks-Reference-Cited by-同舟云学术

xDNN: Inference for Deep Convolutional Neural Networks

Published:2022-01-11 Issue:2 Volume:15 Page:1-29
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

D'Alberto Paolo¹^ORCID,Wu Victor¹,Ng Aaron¹,Nimaiyar Rahul¹,Delaye Elliott¹,Sirasao Ashish²

Affiliation:

1. Xilinx, Logic Drive, San Jose, CA

2. FaceBook, Menlo Park, CA

Abstract

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3473334

Reference37 articles.

1. ML Commons, Inference Data Center;[n.d.];https://mlcommons.org/en/inference-datacenter-10/

2. DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

3. Eyeriss

4. Javier Duarteet al.2019. FPGAs as a service to accelerate machine learning inference. https://people.ece.uw.edu/hauck/publications/AcceleratedMachineLearning.pdf.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on technologies of accelerated processing module based on VU13P;2023 3rd International Conference on Electronic Information Engineering and Computer (EIECT);2023-11-17

2. Investigating the Impact of Non-Volatile Memories on Energy-Efficiency of Coarse-Grained Reconfigurable Architectures;2023 26th Euromicro Conference on Digital System Design (DSD);2023-09-06

3. Modular and Lean Architecture with Elasticity for Sparse Matrix Vector Multiplication on FPGAs;2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2023-05

4. Energy Efficient Design of Coarse-Grained Reconfigurable Architectures: Insights, Trends and Challenges;2022 International Conference on Field-Programmable Technology (ICFPT);2022-12-05

5. A Resource Efficient CNN Accelerator for Sensor Signal Processing Based on FPGA;Journal of Circuits, Systems and Computers;2022-10-05