Mosaic

Author:

Ausavarungnirun Rachata1,Landgraf Joshua2,Miller Vance2,Ghose Saugata3,Gandhi Jayneel4,Rossbach Christopher J.5,Mutlu Onur6

Affiliation:

1. Carnegie Mellon University & King Mongkut University of Technology North Bangkok

2. University of Texas at Austin

3. Carnegie Mellon University

4. VMware Research

5. University of Texas at Austin & VMware Research

6. Carnegie Mellon University & ETH Zürich

Abstract

Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a virtual address space abstraction similar to CPUs and eliminating manual memory management, but they introduce high performance overheads during (1) address translation and (2) page faults. A GPU relies on high degrees of thread-level parallelism (TLP) to hide memory latency. Address translation can undermine TLP, as a single miss in the translation lookaside buffer (TLB) invokes an expensive serialized page table walk that often stalls multiple threads. Demand paging can also undermine TLP, as multiple threads often stall while they wait for an expensive data transfer over the system I/O (e.g., PCIe) bus when the GPU demands a page. In modern GPUs, we face a trade-off on how the page size used for memory management affects address translation and demand paging. The address translation overhead is lower when we employ a larger page size (e.g., 2MB large pages, compared with conventional 4KB base pages), which increases TLB coverage and thus reduces TLB misses. Conversely, the demand paging overhead is lower when we employ a smaller page size, which decreases the system I/O bus transfer latency. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing (i.e., merging base pages into a large page) and splintering (i.e., splitting a large page into base pages) policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present an opportunity to support multiple page sizes without costly data migration, as the applications perform most of their memory allocation en masse (i.e., they allocate a large number of base pages at once).We show that this en masse allocation allows us to create intelligent memory allocation policies which ensure that base pages that are contiguous in virtual memory are allocated to contiguous physical memory pages. As a result, coalescing and splintering operations no longer need to migrate base pages.

Publisher

Association for Computing Machinery (ACM)

Reference143 articles.

1. "NVIDIA GRID " http://www.nvidia.com/object/grid-boards.html. "NVIDIA GRID " http://www.nvidia.com/object/grid-boards.html.

2. A. Abrevaya "Linux Transparent Huge Pages JEMalloc and NuoDB " http://www.nuodb.com/techblog/ linux-transparent-huge-pages-jemalloc-and-nuodb 2014 A. Abrevaya "Linux Transparent Huge Pages JEMalloc and NuoDB " http://www.nuodb.com/techblog/ linux-transparent-huge-pages-jemalloc-and-nuodb 2014

3. Advanced Micro Devices "AMD Accelerated Processing Units." {Online}. Available: http://www.amd.com/us/products/technologies/apu/Pages/apu.aspx Advanced Micro Devices "AMD Accelerated Processing Units." {Online}. Available: http://www.amd.com/us/products/technologies/apu/Pages/apu.aspx

4. Advanced Micro Devices Inc. "OpenCL: The Future of Accelerated Application Performance Is Now " https://www.amd.com/Documents/FirePro_OpenCL_ Whitepaper.pdf. Advanced Micro Devices Inc. "OpenCL: The Future of Accelerated Application Performance Is Now " https://www.amd.com/Documents/FirePro_OpenCL_ Whitepaper.pdf.

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Accelerating Extra Dimensional Page Walks for Confidential Computing;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

2. Grus;ACM Transactions on Architecture and Code Optimization;2021-03

3. Modeling and Analysis of the Page Sizing Problem for NVM Storage in Virtualized Systems;IEEE Access;2021

4. Compacted CPU/GPU Data Compression via Modified Virtual Address Translation;Proceedings of the ACM on Computer Graphics and Interactive Techniques;2020-08-26

5. A Hypervisor for Shared-Memory FPGA Platforms;Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems;2020-03-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3