Affiliation:
1. Indian Institute of Science, Bangalore, India
Abstract
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance.
In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference29 articles.
1. MaJIC
2. ATI Technologies http://ati.amd.com/products/index.html ATI Technologies http://ati.amd.com/products/index.html
3. A compiler framework for optimization of affine loop nests for gpgpus
4. M. Baskaran U. Bondhugula S. Krishnamoorthy J. Ramanujam A. Rountev P. Sadayappan. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). 10.1145/1345206.1345210 M. Baskaran U. Bondhugula S. Krishnamoorthy J. Ramanujam A. Rountev P. Sadayappan. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). 10.1145/1345206.1345210
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Compilation of MATLAB computations to CPU/GPU via C/OpenCL generation;Concurrency and Computation: Practice and Experience;2020-06
2. Design, implementation, and application of GPU-based Java bytecode interpreters;Proceedings of the ACM on Programming Languages;2019-10-10
3. Compiler Techniques for Efficient MATLAB to OpenCL Code Generation;Proceedings of the 5th International Workshop on OpenCL;2017-05-16
4. Dataflow in MATLAB: Algorithm Acceleration Through Concurrency;IEEE Access;2017
5. SSA-based MATLAB-to-C compilation and optimization;Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming;2016-06-02