Improving Direct Convolution through Tensor Slicing, Vectorized Packing and ISA Extensions-Reference-Cited by-同舟云学术

Improving Direct Convolution through Tensor Slicing, Vectorized Packing and ISA Extensions

Published:2024-07-21 Issue: Volume: Page:148-157
ISSN:
Container-title:Anais do XXXVII Concurso de Teses e Dissertações (CTD 2024)
language:
Short-container-title:

Author:

Ferrari Victor,Araujo Guido

Abstract

Convolution is one of the most computationally intensive machine learning model operations, usually solved by the traditional Im2Col + BLAS method. This work describes SConv: a novel direct-convolution algorithm to improve upon Im2Col + BLAS by introducing compile-time and execution time components to tile, vectorize and optimize the computation. SConv’s speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 11% – 27% for Intel x86 and 11% – 34% for IBM POWER10 architectures. The total convolution speedup for model inference is 13% – 28% on Intel x86 and 23% – 39% on IBM POWER10. SConv also outperforms oneDNN in 6 out of 7 models.

Publisher

Sociedade Brasileira de Computação - SBC

Reference25 articles.

1. Anderson, A., Vasudevan, A., Keane, C., and Gregg, D. (2020). High-performance low-memory lowering: Gemm-based algorithms for dnn convolution. In 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 99–106.

2. Barrachina, S., Castelló, A., Dolz, M. F., Low, T. M., Martínez, H., Quintana-Ortí, E. S., Sridhar, U., and Tomás, A. E. (2023). Reformulating the direct convolution for high-performance deep learning inference on arm processors. Journal of Systems Architecture, 135:102806.

3. Chellapilla, K., Puri, S., and Simard, P. (2006). High Performance Convolutional Neural Networks for Document Processing. In Lorette, G., editor, Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France). Université de Rennes 1, Suvisoft.

4. Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J. M., Tran, J., Catanzaro, B., and Shelhamer, E. (2014). cudnn: Efficient primitives for deep learning. ArXiv, abs/1410.0759.

5. Cho, M. and Brand, D. (2017). Mec: Memory-efficient convolution for deep neural network. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 815–824. JMLR.org.