PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications-Reference-Cited by-同舟云学术

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

Published:2023-06-21 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 37th International Conference on Supercomputing
language:
Short-container-title:

Author:

Zhang Lingqi¹²^ORCID,Wahib Mohamed³^ORCID,Chen Peng²³^ORCID,Meng Jintao⁴^ORCID,Wang Xiao⁵^ORCID,Endo Toshio¹^ORCID,Matsuoka Satoshi⁶¹^ORCID

Affiliation:

1. Tokyo Institute of Technology, Tokyo, Japan

2. National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

3. RIKEN Center for Computational Science, Tokyo, Japan

4. Shenzhen Institutes of Advanced Technology, Shenzhen, China

5. Oak Ridge National Laboratory, Knoxville, United States of America

6. RIKEN Center for Computational Science, Kobe, Japan

Funder

JST, PRESTO

JSPS KAKENHI

New Energy and Industrial Technology Development Organization (NEDO)

Publisher

ACM

Reference84 articles.

1. 2022. TOP500. https://www.top500.org/lists/top500/2022/06/highs/ [Online ; accessed 27- Mar- 2021 ]. 2022. TOP500. https://www.top500.org/lists/top500/2022/06/highs/ [Online; accessed 27-Mar-2021].

2. Efficient implementation of Jacobi iterative method for large sparse linear systems on graphic processing units

3. Understanding the efficiency of ray traversal on GPUs

4. José I. Aliaga , Joaquín Pérez , and Enrique S . Quintana-Ortí . 2015 . Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers. In Euro-Par 2015: Parallel Processing, Jesper Larsson Träff, Sascha Hunold, and Francesco Versaci (Eds.). Springer Berlin Heidelberg , Berlin, Heidelberg, 675--686. José I. Aliaga, Joaquín Pérez, and Enrique S. Quintana-Ortí. 2015. Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers. In Euro-Par 2015: Parallel Processing, Jesper Larsson Träff, Sascha Hunold, and Francesco Versaci (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 675--686.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Retargeting and Respecializing GPU Workloads for Performance Portability;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02

2. ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20