CAPRI-Reference-Cited by-同舟云学术

CAPRI

Published:2012-09-05 Issue:3 Volume:40 Page:61-71
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Rhu Minsoo¹,Erez Mattan¹

Affiliation:

1. The University of Texas at Austin

Abstract

Wide SIMD-based GPUs have evolved into a promising platform for running general purpose workloads. Current programmable GPUs allow even code with irregular control to execute well on their SIMD pipelines. To do this, each SIMD lane is considered to execute a logical thread where hardware ensures that control flow is accurate by automatically applying masked execution. The masked execution, however, often degrades performance because the issue slots of masked lanes are wasted. This degradation can be mitigated by dynamically compacting multiple unmasked threads into a single SIMD unit. This paper proposes a fundamentally new approach to branch compaction that avoids the unnecessary synchronization required by previous techniques and that only stalls threads that are likely to benefit from compaction. Our technique is based on the compaction-adequacy predictor (CAPRI). CAPRI dynamically identifies the compaction-effectiveness of a branch and only stalls threads that are predicted to benefit from compaction. We utilize a simple single-level branch-predictor inspired structure and show that this simple configuration attains a prediction accuracy of 99.8% and 86.6% for non-divergent and divergent workloads, respectively. Our performance evaluation demonstrates that CAPRI consistently outperforms both the baseline design that never attempts compaction and prior work that stalls upon all divergent branches.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2366231.2337167

Reference31 articles.

1. Conversion of control dependence to data dependence

2. Analyzing CUDA workloads using a detailed GPU simulator

3. The Illiac IV system

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reducing branch divergence to speed up parallel execution of unit testing on GPUs;The Journal of Supercomputing;2023-05-13

2. Mimd Programs Execution Support on Simd Machines;2023

3. On-GPU thread-data remapping for nested branch divergence;Journal of Parallel and Distributed Computing;2020-05

4. A Lightweight Method for Handling Control Divergence in GPGPUs;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2019-01-14

5. On-GPU Thread-Data Remapping for Branch Divergence Reduction;ACM Transactions on Architecture and Code Optimization;2018-09-30