Groute-Reference-Cited by-同舟云学术

Groute

Published:2020-08-05 Issue:3 Volume:7 Page:1-27
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Ben-Nun Tal¹,Sutton Michael²,Pai Sreepathi³,Pingali Keshav⁴

Affiliation:

1. ETH Zurich, Zürich, Switzerland

2. The Hebrew University of Jerusalem, Jerusalem, Israel

3. University of Rochester, Rochester, NY, USA

4. The University of Texas at Austin, Austin, TX, USA

Abstract

Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. This article proposes constructs for asynchronous multi-GPU programming and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on eight-GPU and heterogeneous systems, yielding over 7× speedup for some algorithms.

Funder

Deutsche Forschungsgemeinschaft

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

National Science Foundation

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3399730

Reference45 articles.

1. Groute Authors. 2017. Groute Runtime Environment Source Code. Retrieved from https://www.github.com/groute/groute. Groute Authors. 2017. Groute Runtime Environment Source Code. Retrieved from https://www.github.com/groute/groute.

2. Karlsruhe Institute of Technology. 2014. OSM Europe Graph. Retrieved from http://i11www.iti.uni-karlsruhe.de/resources/roadgraphs.php. Karlsruhe Institute of Technology. 2014. OSM Europe Graph. Retrieved from http://i11www.iti.uni-karlsruhe.de/resources/roadgraphs.php.

3. Andrew Adinetz. 2014. Optimized filtering with warp-aggregated atomics. Retrieved from http://devblogs.nvidia.com/parallelfor all/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/. Andrew Adinetz. 2014. Optimized filtering with warp-aggregated atomics. Retrieved from http://devblogs.nvidia.com/parallelfor all/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/.

4. Graph Partitioning and Graph Clustering

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ABSS: An Adaptive Batch-Stream Scheduling Module for Dynamic Task Parallelism on Chiplet-based Multi-Chip Systems;ACM Transactions on Parallel Computing;2024-01-29

2. Single‐ and multi‐GPU computing on NVIDIA‐ and AMD‐based server platforms for solidification modeling application;Concurrency and Computation: Practice and Experience;2023-12-27

3. Optimizing GPU-Based Graph Sampling and Random Walk for Efficiency and Scalability;IEEE Transactions on Computers;2023-09-01

4. SpMV and BiCG-Stab sparse solver on Multi-GPUs for reservoir simulation;Multimedia Tools and Applications;2023-08-17

5. FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02