Efficient execution of recursive programs on commodity vector hardware-Reference-Cited by-同舟云学术

Efficient execution of recursive programs on commodity vector hardware

Published:2015-08-07 Issue:6 Volume:50 Page:509-520
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Ren Bin¹,Jo Youngjoon²,Krishnamoorthy Sriram¹,Agrawal Kunal³,Kulkarni Milind²

Affiliation:

1. Pacific Northwest National Laboratory, USA

2. Purdue University, USA

3. Washington University at St. Louis, USA

Abstract

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to efficiently execute data-parallel computations in a vectorized manner. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This paper presents a set of novel code transformations that expose the data parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel’s SSE4.2 vector units, as well as accelerators using Intel’s AVX512 units.

Funder

National Science Foundation

U.S. Department of Energy

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2813885.2738004

Reference37 articles.

1. Understanding the efficiency of ray traversal on GPUs

2. Barcelona OpenMP Task Suite (BOTS). Barcelona OpenMP Task Suite (BOTS). https://pm.bsc.es/projects/bots. Barcelona OpenMP Task Suite (BOTS). Barcelona OpenMP Task Suite (BOTS). https://pm.bsc.es/projects/bots.

3. From relational verification to SIMD loop synthesis

4. Cilk

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scalability analysis of AVX-512 extensions;The Journal of Supercomputing;2019-04-23

2. Performance Comparison of NVIDIA accelerators with SIMD, Associative, and Multi-core Processors for Air Traffic Management;Proceedings of the 47th International Conference on Parallel Processing Companion;2018-08-13