Affiliation:
1. University of Edinburgh, Edinburgh, United Kingdom
Abstract
GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity.
In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference22 articles.
1. AMD. Accelerated parallel processing (APP) SDK (formerly ATI stream). http://developer.amd.com/appsdk AMD. Accelerated parallel processing (APP) SDK (formerly ATI stream). http://developer.amd.com/appsdk
2. Applied Numerical Algorithms Group LBNL. CHOMBO - Software for adaptive solutions of partial differential equations. https://commons.lbl.gov/display/chombo/ Applied Numerical Algorithms Group LBNL. CHOMBO - Software for adaptive solutions of partial differential equations. https://commons.lbl.gov/display/chombo/
3. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
4. Auto-tuning SkePU
Cited by
50 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments;International Journal of Parallel Programming;2024-06-21
2. Stencil Computation with Vector Outer Product;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30
3. Fingerprinting and Mapping Cloud FPGA Infrastructures;Security of FPGA-Accelerated Cloud Computing Environments;2023-09-18
4. EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs;The Journal of Supercomputing;2023-01-14
5. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11