A parallel block implementation of Level-3 BLAS for MIMD vector processors

Author:

Daydé Michel J.1,Duff Iain S.2,Petitet Antoine2

Affiliation:

1. ENSEEIHT-IRIT, Toulouse, France

2. CERFACS, Toulouse, France

Abstract

We describe an implementation of Level-3 BLAS (Basic Linear Algebra Subprograms) based on the use of the matrix-matrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level-3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/VF. For GEMM, we always use the manufacturer-supplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using loop-unrolling) codes, depending on the efficiency of the available libraries.

Publisher

Association for Computing Machinery (ACM)

Subject

Applied Mathematics,Software

Reference25 articles.

1. AIvlESTOY P R. DAYD~ M. J. DUFF h S. AND MORgRE P. 1992 Linear algebra calculations on the BBN TC2000. CERFACS Rep. TR/PA/92/69. AIvlESTOY P R. DAYD~ M. J. DUFF h S. AND MORgRE P. 1992 Linear algebra calculations on the BBN TC2000. CERFACS Rep. TR/PA/92/69.

2. ANDERSON E. BAI Z. BISCHOF C. DEMMEL J. W. DONGARRA J J. DU CROZ J. GREENBAUM A. HAMMARLING S McKENNEY A OSTROUCHOV S AND SORENSEN D. C. 1992. LAPACK Users' Guide. SIAM Philadelphia Pa. ANDERSON E. BAI Z. BISCHOF C. DEMMEL J. W. DONGARRA J J. DU CROZ J. GREENBAUM A. HAMMARLING S McKENNEY A OSTROUCHOV S AND SORENSEN D. C. 1992. LAPACK Users' Guide. SIAM Philadelphia Pa.

3. BEm~En P DAYDI~ M J. AND MORi~RE P. 1991 Implementation and use of Level 3 BLAS kernels on a transputer T800 Ring Network. CERFACS Rep. TR/PA/91/54. BEm~En P DAYDI~ M J. AND MORi~RE P. 1991 Implementation and use of Level 3 BLAS kernels on a transputer T800 Ring Network. CERFACS Rep. TR/PA/91/54.

4. BlSCHOF C. AND VAN LOAN C. 1987. The WY representation for products of house-holder matrices SIAM J ScL Stat Comput. 8 2 2-13. 10.1137/0908009 BlSCHOF C. AND VAN LOAN C. 1987. The WY representation for products of house-holder matrices SIAM J ScL Stat Comput. 8 2 2-13. 10.1137/0908009

5. Block-oriented, local-memory-based linear equation solution on the CRAY-2: Uniprocessor algorithms. In Proceedings of the International Con/brence on Parallel Processing. IEEE Computer Society, Washington;CALAHAN D. A.;D C.,1986

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3