Efficient and portable Winograd convolutions for multi-core processors-Reference-Cited by-同舟云学术

Efficient and portable Winograd convolutions for multi-core processors

Published:2023-02-12 Issue:10 Volume:79 Page:10589-10610
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Dolz Manuel F.,Martínez Héctor,Castelló Adrián,Alonso-Jordá Pedro,Quintana-Ortí Enrique S.

Abstract

AbstractWe take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector instructions from Intel SSE/AVX2/AVX512 and ARM NEON/SVE to exploit the single-instruction multiple-data capabilities of current processors as well as OpenMP pragmas to exploit multi-threaded parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on three distinct processors, with Intel Xeon Skylake, ARM Cortex A57 and Fujitsu A64FX processors, show that the impact is affordable and still renders a Winograd-based solution that is competitive when compared with the lowering gemm-based convolution.

Funder

Agencia Estatal de Investigación,Spain

Conselleria d'Educació, Investigació, Cultura i Esport

Junta de Andalucía

Agencia Estatal de Investigación

Universitat Jaume I

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-023-05088-4.pdf

Reference18 articles.

1. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu M-L, Chen S-C, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1-92:36. https://doi.org/10.1145/3234150

2. Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

3. Zhang J, Franchetti F, Low TM (2018) High performance zero-memory overhead direct convolutions. In: Proceedings of the 35th International Conference on Machine Learning—ICML, vol. 80, pp. 5776–5785

4. Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: International workshop on frontiers in handwriting recognition

5. Georganas E, Avancha S, Banerjee K, Kalamkar D, Henry G, Pabst H, Heinecke A (2018) Anatomy of high-performance deep learning convolutions on SIMD architectures. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC ’18. IEEE Press

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures;Journal of Systems Architecture;2024-08

2. SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUs;IEEE Access;2024

3. Acceleration of Convolutional Neural Networks;2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE);2023-12-04

4. GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal;Lecture Notes in Computer Science;2023