Characterizing Multi-Chip GPU Data Sharing

Author:

Zhang Shiqing1ORCID,Naderan-Tahan Mahmood1ORCID,Jahre Magnus2ORCID,Eeckhout Lieven1ORCID

Affiliation:

1. Ghent University, Belgium

2. Norwegian University of Science and Technology (NTNU), Norway

Abstract

Multi-chip Graphics Processing Unit (GPU) systems are critical to scale performance beyond a single GPU chip for a wide variety of important emerging applications. A key challenge for multi-chip GPUs, though, is how to overcome the bandwidth gap between inter-chip and intra-chip communication. Accesses to shared data, i.e., data accessed by multiple chips, pose a major performance challenge as they incur remote memory accesses possibly congesting the inter-chip links and degrading overall system performance. This article characterizes the shared dataset in multi-chip GPUs in terms of (1) truly versus falsely shared data, (2) how the shared dataset scales with input size, (3) along which dimensions the shared dataset scales, and (4) how sensitive the shared dataset is with respect to the input’s characteristics, i.e., node degree and connectivity in graph workloads. We observe significant variety in scaling behavior across workloads: some workloads feature a shared dataset that scales linearly with input size, whereas others feature sublinear scaling (following a \(\sqrt {2}\) or \(\sqrt [3]{2}\) relationship). We further demonstrate how the shared dataset affects the optimum last-level cache organization (memory-side versus SM-side) in multi-chip GPUs, as well as optimum memory page allocation and thread scheduling policy. Sensitivity analyses demonstrate the insights across the broad design space.

Funder

CSC scholarship

Magnus Jahre is supported by the Research Council of Norway

UGent-BOF-GOA

European Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference29 articles.

1. AMD. 2021. AMD Instinct MI200 Series Accelerator. https://www.amd.com/system/files/documents/amd-instinct-mi200-datasheet.pdf[Online; accessed 2023-03-02].

2. Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. In Proceedings of the International Symposium on Computer Architecture (ISCA). IEEE, 320–332.

3. Analyzing CUDA workloads using a detailed GPU simulator

4. Trinayan Baruah, Yifan Sun, Ali Tolga Dinçer, Saiful A. Mojumder, José L. Abellán, Yash Ukidave, Ajay Joshi, Norman Rubin, John Kim, and David Kaeli. 2020. Griffin: Hardware-software support for efficient page migration in multi-GPU systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). IEEE, 596–609.

5. David Blythe. 2021. Xe-HPC Ponte Vecchio. In Proceedings of Hot Chips 33 Symposium (HCS). IEEE Computer Society, 1–34.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3