Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons-Reference-Cited by-同舟云学术

Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons

Published:2020-05-22 Issue:4 Volume:48 Page:713-728
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Wrede Fabian,Kuchen Herbert

Abstract

AbstractIn earlier work, we defined a domain-specific language (DSL) with the aim to provide an easy-to-use approach for programming multi-core and multi-GPU clusters. The DSL incorporates the idea of utilizing algorithmic skeletons, which are well-known patterns for parallel programming, such as map and reduce. Based on the chosen skeleton, a user-defined function can be applied to a data structure in parallel with the main advantage that the user does not have to worry about implementation details. So far, we had only implemented a generator for multi-core clusters and in this paper we present and evaluate two prototypes of generators for multi-GPU clusters, which are based on OpenACC and CUDA. We have evaluated the approach with four benchmark applications. The results show that the generation approach leads to execution times, which are on par with an alternative library implementation.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s10766-020-00659-x.pdf

Reference23 articles.

1. Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA. In: GPU Computing Gems Jade edition, pp. 359–371. Elsevier, Amsterdam (2012)

2. OpenACC Organization. Openacc (2019)

3. Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge (1991)

4. Kuchen, H.: A skeleton library. In: B. Monien and R. Feldmann, (eds.) Proceedings of the 8th International Euro-Par Conference on Parallel Processing, volume 2400 of Lecture Notes in Computer Science, Berlin, Heidelberg, pp. 620–629 (2002)

5. Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generating Custom Learned Cost Model for Query Optimizer of DBMS;Communications in Computer and Information Science;2024