Affiliation:
1. University of Texas at Austin
2. Yale University
3. Technion
Abstract
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order to facilitate program development and enable harmonious integration of GPUs in computing systems. As an example, we describe the design and implementation of GPUFs, a software layer which provides operating system support for accessing host files directly from GPU programs. GPUFs provides a POSIX-like API, exploits GPU parallelism for efficiency, and optimizes GPU file access by extending the host CPU's buffer cache into GPU memory. Our experiments, based on a set of real benchmarks adapted to use our file system, demonstrate the feasibility and benefits of the GPUFs approach. For example, a self-contained GPU program that searches for a set of strings throughout the Linux kernel source tree runs over seven times faster than on an eight-core CPU.
Funder
Division of Computer and Network Systems
Andrew and Erna Fince Viterbi Fellowship
Nvidia
Publisher
Association for Computing Machinery (ACM)
Reference37 articles.
1. AMD. AMD and HSA: A new era of vivid digital experiences. http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx. AMD. AMD and HSA: A new era of vivid digital experiences. http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx.
2. TreadMarks: shared memory computing on networks of workstations
3. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
4. The multikernel
5. Scientific and Engineering Computing Using ATI Stream Technology
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. GPU-Initiated Resource Allocation for Irregular Workloads;Proceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions;2024-03-02
2. e-CLAS: Effective GPUDirect I/O Classification Scheme;Lecture Notes in Computer Science;2024
3. Application of Machine Learning and Parallel Computing to Search for Hypersurfaces Containing Data in Non-Linear Spaces;2023 Seminar on Information Computing and Processing (ICP);2023-11-27
4. GPU Graph Processing on CXL-Based Microsecond-Latency External Memory;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12
5. Wukong+G: Fast and Concurrent RDF Query Processing Using RDMA-Assisted GPU Graph Exploration;IEEE Transactions on Parallel and Distributed Systems;2022-07-01