Affiliation:
1. UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS
61801-2932, USA
Abstract
Linear algebra algorithms based on the BLAS or ex tended BLAS do not achieve high performance on mul tivector processors with a hierarchical memory system because of a lack of data locality. For such machines, block linear algebra algorithms must be implemented in terms of matrix-matrix primitives (BLAS3). Designing ef ficient linear algebra algorithms for these architectures requires analysis of the behavior of the matrix-matrix primitives and the resulting block algorithms as a func tion of certain system parameters. The analysis must identify the limits of performance improvement possible via blocking and any contradictory trends that require trade-off consideration. We propose a methodology that facilitates such an analysis and use it to analyze the per formance of the BLAS3 primitives used in block methods. A similar analysis of the block size-perfor mance relationship is also performed at the algorithm level for block versions of the LU decomposition and the Gram-Schmidt orthogonalization procedures.
Cited by
55 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献