You Cannot Improve What You Do not Measure-Reference-Cited by-同舟云学术

You Cannot Improve What You Do not Measure

Published:2018-09-30 Issue:3 Volume:11 Page:1-23
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Boutros Andrew¹^ORCID,Yazdanshenas Sadegh²^ORCID,Betz Vaughn¹

Affiliation:

1. Department of Electrical and Computer Engineering, University of Toronto, ON, Canada

2. Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada

Abstract

Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL models but also causes lower performance and area-efficiency compared to ASICs. In this article, we implement three state-of-the-art computing architectures (CAs) for convolutional neural network (CNN) inference on FPGAs and ASICs. By comparing the FPGA and ASIC implementations, we highlight the area and performance costs of programmability to pinpoint the inefficiencies in current FPGA architectures. We perform our experiments using three variations of these CAs for AlexNet, VGG-16 and ResNet-50 to allow extensive comparisons. We find that the performance gap varies significantly from 2.8× to 6.3×, while the area gap is consistent across CAs with an 8.7 average FPGA-to-ASIC area ratio. Among different blocks of the CAs, the convolution engine, constituting up to 60% of the total area, has a high area ratio ranging from 13 to 31. Motivated by our FPGA vs. ASIC comparisons, we suggest FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories to reduce the programmability gap for DL applications.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3242898

Reference57 articles.

1. An OpenCL™ Deep Learning Accelerator on Arria 10

2. DaDianNao: A Machine-Learning Supercomputer

3. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks;Chen Y.;Proceedings of the JSSC,2017

4. S. Chetlur etal 2014. CuDNN: Efficient primitives for deep learning. arXiv:1410.0759. S. Chetlur et al. 2014. CuDNN: Efficient primitives for deep learning. arXiv:1410.0759.

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating Operations Research into Very Large-Scale Integrated Circuits Placement Design: A Review;Asia-Pacific Journal of Operational Research;2024-07-06

2. Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators;Journal of Parallel and Distributed Computing;2024-07

3. Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform;Engineering, Technology & Applied Science Research;2024-02-08

4. High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design;ACM Transactions on Reconfigurable Technology and Systems;2024-01-15

5. Bridging the Gap in ECG Classification: Integrating Self-supervised Learning with Human-in-the-Loop Amid Medical Equipment Hardware Constraints;Lecture Notes in Computer Science;2024