Affiliation:
1. University of Maryland at College Park, College Park, MD
Abstract
Heterogeneous microprocessors integrate a CPU and GPU on the same chip, providing fast CPU-GPU communication and enabling cores to compute on data “in place.” This permits exploiting a finer granularity of parallelism on the integrated GPUs, and enables the use of GPUs for accelerating more complex and irregular codes. One challenge, however, is exposing enough parallelism such that both the CPU and GPU are effectively utilized to achieve maximum gain.
In this article, we propose exploiting nested parallelism for integrated CPU-GPU chips. We look for loop structures in which one or more regular data parallel loops are nested within a parallel outer loop that can contain irregular code (e.g., with control divergence). By scheduling the outer loop on multiple CPU cores, multiple dynamic instances of the inner regular loop(s) can be scheduled on the GPU cores. This boosts GPU utilization and parallelizes the outer loop. We find that such
nested MIMD-SIMD parallelization
provides greater levels of parallelism for integrated CPU-GPU chips, and additionally there is ample opportunity to perform such parallelization in OpenMP programs.
Our results show nested MIMD-SIMD parallelization provides a 16.1x and 8.67x speedup over sequential execution on a simulator and a physical machine, respectively. Our technique beats CPU-only parallelization by 4.13x and 2.40x, respectively, and GPU-only parallelization by 2.74x and 2.26x, respectively. Compared to the next-best scheme (either CPU- or GPU-only parallelization) per benchmark, our approach provides a 1.46x and 1.23x speedup for the simulator and physical machine, respectively.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference36 articles.
1. Intel Corporation. [n.d.]. Intel Sandy Bridge Microarchitecture. Available at http://www.intel.com. Intel Corporation. [n.d.]. Intel Sandy Bridge Microarchitecture. Available at http://www.intel.com.
2. N. Brookwood. 2010. AMD Fusion Family of APUs: Enabling a Superior Immersive PC Experience. White Paper. AMD. N. Brookwood. 2010. AMD Fusion Family of APUs: Enabling a Superior Immersive PC Experience. White Paper. AMD.
3. Apple Inc. [n.d.] iPhone. Available at: https://www.apple.com/eg/iphone/. Apple Inc. [n.d.] iPhone. Available at: https://www.apple.com/eg/iphone/.
4. Dynamic thread block launch
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献