Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs-Reference-Cited by-同舟云学术

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs

Published:2017-06-14 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the International Conference on Supercomputing
language:
Short-container-title:

Author:

Abdelfattah Ahmad¹,Haidar Azzam¹,Tomov Stanimire¹,Dongarra Jack¹

Affiliation:

1. University of Tennessee

Funder

National Science Foundation

U.S. Department of Energy

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3079079.3079103

Reference25 articles.

1. A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers

2. Experiences in autotuning matrix multiplication for energy minimization on GPUs

3. Automatic code generation for many-body electronic structure methods: the tensor contraction engine‡‡

4. Ali Charara Hatem Ltaief and David E. Keyes. 2016. Redesigning Triangular Dense Matrix Computations on GPUs. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing Grenoble France August 24--26 2016 Proceedings. 477--489. 10.1007/978-3-319-43659-3_35 Ali Charara Hatem Ltaief and David E. Keyes. 2016. Redesigning Triangular Dense Matrix Computations on GPUs. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing Grenoble France August 24--26 2016 Proceedings . 477--489. 10.1007/978-3-319-43659-3_35

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures;The International Journal of High Performance Computing Applications;2024-06-20

2. Using Additive Modifications in LU Factorization Instead of Pivoting;Proceedings of the 37th International Conference on Supercomputing;2023-06-21

3. Optimization Techniques for GPU Programming;ACM Computing Surveys;2023-03-16

4. Accelerating small matrix multiplications by adaptive batching strategy on GPU;2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2022-12

5. Batched matrix operations on distributed GPUs with application in theoretical physics;2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO);2022-05-23