Author:
Dolz Manuel F.,Barrachina Sergio,Martínez Héctor,Castelló Adrián,Maciá Antonio,Fabregat Germán,Tomás Andrés E.
Abstract
AbstractIn this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal configuration.
Funder
Agencia Estatal de Investigación,Spain
Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana
Consejería de Economía, Innovación, Ciencia y Empleo, Junta de Andalucía
Agencia Estatal de Investigación
Universitat Jaume I
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems,Theoretical Computer Science,Software
Reference31 articles.
1. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu M-L, Chen S-C, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1-92:36. https://doi.org/10.1145/3234150. ([Online])
2. Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
3. San Juan P, Castelló A, Dolz MF, Alonso-Jordá P, Quintana-Ortí ES (2020) High performance and portable convolution operators for multicore processors. In: 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp 91–98
4. Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions. In: Proceedings of the 35th International Conference on Machine Learning – ICML, Vol 80, pp 5776–5785
5. Pantho MJH, Bhowmik P, Bobda C (2021) Towards an efficient CNN inference architecture enabling in-sensor processing. Sensors 21(6):1955
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献