Affiliation:
1. University of Edinburgh, UK / University of Münster, Germany
2. Heriot-Watt University, UK
3. University of Edinburgh, UK
Abstract
Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability at the expense of performance. We propose a novel approach aiming to combine high-level programming, code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation, close to the OpenCL programming model, from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations. We formalize our system with a core dependently-typed lambda-calculus along with a denotational semantics which we use to prove the correctness of the rewrite rules. We test our design in practice by implementing a compiler which generates high performance imperative OpenCL code. Our experiments show that we can automatically derive hardware-specific implementations from simple functional high-level algorithmic expressions offering performance on a par with highly tuned code for multicore CPUs and GPUs written by experts.
Funder
Engineering and Physical Sciences Research Council
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference44 articles.
1. AMD Accelerated Parallel Processing OpenCL Programming Guide. AMD 2013. AMD Accelerated Parallel Processing OpenCL Programming Guide. AMD 2013.
2. PetaBricks
3. Nested data-parallelism on the gpu
4. A Heterogeneous Parallel Framework for Domain-Specific Languages
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SpEQ: Translation of Sparse Codes using Equivalences;Proceedings of the ACM on Programming Languages;2024-06-20
2. A shared compilation stack for distributed-memory parallelism in stencil DSLs;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27
3. Zero-Overhead Parallel Scans for Multi-Core CPUs;Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores;2024-03-03
4. BaCO: A Fast and Portable Bayesian Compiler Optimization Framework;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4;2023-03-25
5. OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous Platforms;Algorithms and Architectures for Parallel Processing;2022