Affiliation:
1. Imperial College London
2. University of Southampton
3. Louisiana State University
Abstract
Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]]—hence the name “sparse.” One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of the computation of numerical integrals over different domains (e.g., cells, facets). The major challenge with sparse tiling is implementation—not only is it cumbersome to understand and synthesize, but it is also onerous to maintain and generalize, as it requires a complete rewrite of the bulk of the numerical computation. In this article, we propose an approach to extend the applicability of sparse tiling based on raising the level of abstraction. Through a sequence of compiler passes, the mathematical specification of a problem is progressively lowered, and eventually sparse-tiled C for-loops are generated. Besides automation, we advance the state-of-the-art by introducing a revisited, more efficient sparse tiling algorithm; support for distributed-memory parallelism; a range of fine-grained optimizations for increased runtime performance; implementation in a publicly available library, SLOPE; and an in-depth study of the performance impact in Seigen, a real-world elastic wave equation solver for seismological problems, which shows speed-ups up to 1.28× on a platform consisting of 896 Intel Broadwell cores.
Funder
Engineering and Physical Sciences Research Council
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Reference49 articles.
1. Parallel multigrid solver for 3D unstructured finite element problems
2. Utkarsh Ayachit. 2015. The ParaView Guide (Full Color Version): A Parallel Visualization Application (paraview 4.3 ed.). Kitware Incorporated. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path===ASIN/1930934300. Utkarsh Ayachit. 2015. The ParaView Guide (Full Color Version): A Parallel Visualization Application (paraview 4.3 ed.). Kitware Incorporated. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path===ASIN/1930934300.
3. Tiling and optimizing time-iterated computations on periodic domains
4. A practical automatic polyhedral parallelizer and locality optimizer
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2;Proceedings of the 52nd International Conference on Parallel Processing;2023-08-07
2. Inter-loop optimization in RAJA using loop chains;Proceedings of the ACM International Conference on Supercomputing;2021-06-03
3. Temporal blocking of finite-difference stencil operators with sparse “off-the-grid” sources;2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2021-05