Affiliation:
1. The University of Texas at Austin, Austin, TX
Abstract
We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.
Funder
Advanced Cyberinfrastructure
Lawrence Livermore National Laboratory, Office of Science
Division of Computing and Communication Foundations
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
446 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献