Affiliation:
1. Xilinx Research Labs, Dublin, Ireland
2. Xilinx Research Labs, Ireland
3. Northeastern University, U.S.
4. Xilinx Research, U.S.
Abstract
Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Reference75 articles.
1. ImageNet Large Scale Visual Recognition Challenge (ILSVRC). 2017. Retrieved from http://image-net.org/challenges/talks_2017/ILSVRC2017_overview.pdf. ImageNet Large Scale Visual Recognition Challenge (ILSVRC). 2017. Retrieved from http://image-net.org/challenges/talks_2017/ILSVRC2017_overview.pdf.
2. M. Abadi A. Agarwal P. Barham E. Brevdo Z. Chen C. Citro G. S. Corrado A. Davis J. Dean M. Devin etal 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467. M. Abadi A. Agarwal P. Barham E. Brevdo Z. Chen C. Citro G. S. Corrado A. Davis J. Dean M. Devin et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467.
3. K. Abdelouahab M. Pelcat J. Sérot C. Bourrasset and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. (2017). K. Abdelouahab M. Pelcat J. Sérot C. Bourrasset and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. (2017).
4. H. Alemdar N. Caldwell V. Leroy A. Prost-Boucle and F. Pétrot. 2016. Ternary neural networks for resource-efficient AI applications. CoRR abs/1609.00222. H. Alemdar N. Caldwell V. Leroy A. Prost-Boucle and F. Pétrot. 2016. Ternary neural networks for resource-efficient AI applications. CoRR abs/1609.00222.
Cited by
256 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Machine learning for anomaly detection in particle physics;Reviews in Physics;2024-12
2. Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs;The 53rd International Conference on Parallel Processing Workshops;2024-08-12
3. A User-Friendly Ecosystem for AI FPGA-Based Accelerators;2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS);2024-07-29
4. VANDOR: Mitigating SEUs into Quantized Neural Networks;2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS);2024-07-03
5. Fast prototyping of Quantized neural networks on an FPGA edge computing device with Brevitas and FINN;2024 Fifteenth International Conference on Ubiquitous and Future Networks (ICUFN);2024-07-02