Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM-Reference-Cited by-同舟云学术

Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Published:2023-12-26 Issue: Volume: Page:
ISSN:0098-3500
Container-title:ACM Transactions on Mathematical Software
language:en
Short-container-title:ACM Trans. Math. Softw.

Author:

Alaejos Guillermo¹,Castelló Adrián¹,Alonso-Jordá Pedro¹,Igual Francisco D.²,Martínez Héctor³,Quintana-Ortí Enrique S.¹

Affiliation:

1. Universitat Politècnica de València, Spain

2. Universidad Complutense de Madrid, Spain

3. Universidad de Córdoba, Spain

Abstract

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication ( gemm ). In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for gemm . This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for gemm 1) improves portability, maintainability and, globally, streamlines the software life cycle; 2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3) features a small memory footprint.

Publisher

Association for Computing Machinery (ACM)

Subject

Applied Mathematics,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3638532

Reference37 articles.

1. Irwan Bello , Barret Zoph , Vijay Vasudevan , and Quoc V. Le . 2017 . Neural Optimizer Search with Reinforcement Learning . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017 , Sydney, NSW, Australia , 6-11 August 2017, Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 459–468. Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc V. Le. 2017. Neural Optimizer Search with Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 459–468.

2. Demystifying Parallel and Distributed Deep Learning

3. Random Search for Hyper-Parameter Optimization;Bergstra James;J. Mach. Learn. Res.,2012

4. Uday Bondhugula. 2020. High Performance Code Generation in MLIR: An Early Case Study with GEMM. CoRR abs/2003.00532(2020). arXiv:2003.00532 https://arxiv.org/abs/2003.00532 Uday Bondhugula. 2020. High Performance Code Generation in MLIR: An Early Case Study with GEMM. CoRR abs/2003.00532(2020). arXiv:2003.00532 https://arxiv.org/abs/2003.00532

5. Anatomy of the BLIS Family of Algorithms for Matrix Multiplication

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors;Lecture Notes in Computer Science;2024