Characterizing Multi-Chip GPU Data Sharing-Reference-Cited by-同舟云学术

Characterizing Multi-Chip GPU Data Sharing

Published:2023-12-14 Issue:4 Volume:20 Page:1-24
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Zhang Shiqing¹^ORCID,Naderan-Tahan Mahmood¹^ORCID,Jahre Magnus²^ORCID,Eeckhout Lieven¹^ORCID

Affiliation:

1. Ghent University, Belgium

2. Norwegian University of Science and Technology (NTNU), Norway

Abstract

Multi-chip Graphics Processing Unit (GPU) systems are critical to scale performance beyond a single GPU chip for a wide variety of important emerging applications. A key challenge for multi-chip GPUs, though, is how to overcome the bandwidth gap between inter-chip and intra-chip communication. Accesses to shared data, i.e., data accessed by multiple chips, pose a major performance challenge as they incur remote memory accesses possibly congesting the inter-chip links and degrading overall system performance. This article characterizes the shared dataset in multi-chip GPUs in terms of (1) truly versus falsely shared data, (2) how the shared dataset scales with input size, (3) along which dimensions the shared dataset scales, and (4) how sensitive the shared dataset is with respect to the input’s characteristics, i.e., node degree and connectivity in graph workloads. We observe significant variety in scaling behavior across workloads: some workloads feature a shared dataset that scales linearly with input size, whereas others feature sublinear scaling (following a

\(\sqrt {2}\)

\(\sqrt [3]{2}\)

relationship). We further demonstrate how the shared dataset affects the optimum last-level cache organization (memory-side versus SM-side) in multi-chip GPUs, as well as optimum memory page allocation and thread scheduling policy. Sensitivity analyses demonstrate the insights across the broad design space.

Funder

CSC scholarship

Magnus Jahre is supported by the Research Council of Norway

UGent-BOF-GOA

European Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3629521

Reference29 articles.

1. AMD. 2021. AMD Instinct MI200 Series Accelerator. https://www.amd.com/system/files/documents/amd-instinct-mi200-datasheet.pdf[Online; accessed 2023-03-02].

2. Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. In Proceedings of the International Symposium on Computer Architecture (ISCA). IEEE, 320–332.

3. Analyzing CUDA workloads using a detailed GPU simulator

4. Trinayan Baruah, Yifan Sun, Ali Tolga Dinçer, Saiful A. Mojumder, José L. Abellán, Yash Ukidave, Ajay Joshi, Norman Rubin, John Kim, and David Kaeli. 2020. Griffin: Hardware-software support for efficient page migration in multi-GPU systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). IEEE, 596–609.

5. David Blythe. 2021. Xe-HPC Ponte Vecchio. In Proceedings of Hot Chips 33 Symposium (HCS). IEEE Computer Society, 1–34.