Abstract
With the quickly evolving hardware landscape of high-performance computing (HPC) and its increasing specialization, the implementation of efficient software applications becomes more challenging. This is especially prevalent for domain scientists and may hinder the advances in large-scale simulation software. One idea to overcome these challenges is through software abstraction. We present a parallel algorithm model that allows for global optimization of their synchronization and dataflow and optimal mapping to complex and heterogeneous architectures. The presented model strictly separates the structure of an algorithm from its executed functions. It utilizes a hierarchical decomposition of parallel design patterns as well-established building blocks for algorithmic structures and captures them in an abstract pattern tree (APT). A data-centric flow graph is constructed based on the APT, which acts as an intermediate representation for rich and automated structural transformations. We demonstrate the applicability of this model to three representative algorithms and show runtime speedups between 1.83 and 2.45 on a typical heterogeneous CPU/GPU architecture.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference59 articles.
1. A view of the parallel computing landscape
2. Compiler transformations for high-performance computing
3. OpenMP API for Parallel Programming, Version 5.0https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
4. MPI: A Message-Passing Interface Standard, Version 3.1https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
5. Patterns for Parallel Programming;Mattson,2004
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献