Affiliation:
1. The University of Texas at Austin, Austin, TX
Abstract
A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.
Funder
Division of Computing and Communication Foundations
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
172 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献