Performance–energy trade-offs of deep learning convolution algorithms on ARM processors-Reference-Cited by-同舟云学术

Performance–energy trade-offs of deep learning convolution algorithms on ARM processors

Published:2023-01-21 Issue:9 Volume:79 Page:9819-9836
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Dolz Manuel F.,Barrachina Sergio,Martínez Héctor,Castelló Adrián,Maciá Antonio,Fabregat Germán,Tomás Andrés E.

Abstract

AbstractIn this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal configuration.

Funder

Agencia Estatal de Investigación,Spain

Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana

Consejería de Economía, Innovación, Ciencia y Empleo, Junta de Andalucía

Agencia Estatal de Investigación

Universitat Jaume I

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-023-05050-4.pdf

Reference31 articles.

1. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu M-L, Chen S-C, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1-92:36. https://doi.org/10.1145/3234150. ([Online])

2. Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

3. San Juan P, Castelló A, Dolz MF, Alonso-Jordá P, Quintana-Ortí ES (2020) High performance and portable convolution operators for multicore processors. In: 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp 91–98

4. Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions. In: Proceedings of the 35th International Conference on Machine Learning – ICML, Vol 80, pp 5776–5785

5. Pantho MJH, Bhowmik P, Bobda C (2021) Towards an efficient CNN inference architecture enabling in-sensor processing. Sensors 21(6):1955

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration;ACM Transactions on Architecture and Code Optimization;2024-09-02

2. VLSI-Friendly Filtering Algorithms for Deep Neural Networks;Applied Sciences;2023-08-06