Affiliation:
1. University of Erlangen-Nuremberg, Erlangen, Germany
Abstract
Automatic memory management makes programming easier. This is also true for general purpose GPU computing where currently no garbage collectors exist. In this paper we present a parallel mark-and-sweep collector to collect GPU memory on the GPU and tune its performance. Performance is increased by: (1) data-parallel marking and sweeping of regions of memory, (2) marking all elements of large arrays in parallel, (3) trading recursion over parallelism to match deeply linked data structures.
(1) is achieved by coarsely processing all potential objects in a region of memory in parallel. When during (1) a large array is detected, it is put aside and a parallel-for is later issued on the GPU to mark its elements. For a data-structure that is a large linked list, we dynamically switch to a marking version with less overhead by performing a few recursive steps sequentially (and multiple lists in parallel).
The collector achieves a speedup of a factor of up-to 11 over a sequential collector on the same GPU.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. QoS4IVSaaS: a QoS management framework for intelligent video surveillance as a service;Personal and Ubiquitous Computing;2016-08-18
2. FastCollect;Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES '16;2016
3. Can C++ be made as safe as SPARK?;ACM SIGAda Ada Letters;2014-11-26
4. Object Support for OpenMP-style Programming of GPU Clusters in Java;2013 27th International Conference on Advanced Information Networking and Applications Workshops;2013-03
5. GPUs as an opportunity for offloading garbage collection;ACM SIGPLAN Notices;2013-01-08