Anatomy of high-performance matrix multiplication-Reference-Cited by-同舟云学术

Anatomy of high-performance matrix multiplication

Published:2008-05 Issue:3 Volume:34 Page:1-25
ISSN:0098-3500
Container-title:ACM Transactions on Mathematical Software
language:en
Short-container-title:ACM Trans. Math. Softw.

Author:

Goto Kazushige¹,Geijn Robert A. van de¹

Affiliation:

1. The University of Texas at Austin, Austin, TX

Abstract

We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

Funder

Advanced Cyberinfrastructure

Lawrence Livermore National Laboratory, Office of Science

Division of Computing and Communication Foundations

Publisher

Association for Computing Machinery (ACM)

Subject

Applied Mathematics,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1356052.1356053

Reference19 articles.

1. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

2. Anderson E. Bai Z. Bischof C. Blackford S. Demmel J. Dongarra J. Croz J. D. Greenbaum A. Hammarling S. McKenney A. and Sorensen D. 1999. LAPACK Users' Guide 3rd Ed. SIAM Press. Anderson E. Bai Z. Bischof C. Blackford S. Demmel J. Dongarra J. Croz J. D. Greenbaum A. Hammarling S. McKenney A. and Sorensen D. 1999. LAPACK Users' Guide 3rd Ed. SIAM Press.

3. The science of deriving dense linear algebra algorithms

4. Bientinesi P. Gunter B. and van de Geijn R. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw. To appear. 10.1145/1377603.1377606 Bientinesi P. Gunter B. and van de Geijn R. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw. To appear. 10.1145/1377603.1377606

Cited by 446 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs;IEEE Transactions on Parallel and Distributed Systems;2024-09

2. Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

3. Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures;Journal of Systems Architecture;2024-08

4. Improving Direct Convolution through Tensor Slicing, Vectorized Packing and ISA Extensions;Anais do XXXVII Concurso de Teses e Dissertações (CTD 2024);2024-07-21

5. Turbo-CF: Matrix Decomposition-Free Graph Filtering for Fast Recommendation;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10