Affiliation:
1. Reservoir Labs, USA
2. Massachusetts Institute of Technology, USA
3. Stanford University, USA
4. Adobe Research, USA
Abstract
We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs.
We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.
Funder
National Science Foundation
Toyota Research Institute
Applications Driving Architectures (ADA) Research Center
U.S. Department of Energy
Defense Advanced Research Projects Agency
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Cited by
32 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Compilation of Modular and General Sparse Workspaces;Proceedings of the ACM on Programming Languages;2024-06-20
2. Compiling Recurrences over Dense and Sparse Arrays;Proceedings of the ACM on Programming Languages;2024-04-29
3. A Tensor Algebra Compiler for Sparse Differentiation;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02
4. A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20
5. Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11