A sparse iteration space transformation framework for sparse tensor algebra-Reference-Cited by-同舟云学术

A sparse iteration space transformation framework for sparse tensor algebra

Published:2020-11-13 Issue:OOPSLA Volume:4 Page:1-30
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Senanayake Ryan¹^ORCID,Hong Changwan²,Wang Ziheng²,Wilson Amalee³,Chou Stephen²,Kamil Shoaib⁴,Amarasinghe Saman²,Kjolstad Fredrik³

Affiliation:

1. Reservoir Labs, USA

2. Massachusetts Institute of Technology, USA

3. Stanford University, USA

4. Adobe Research, USA

Abstract

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs. We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.

Funder

National Science Foundation

Toyota Research Institute

Applications Driving Architectures (ADA) Research Center

U.S. Department of Energy

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3428226

Reference50 articles.

1. Learning to optimize halide with tree search and random programs

2. Scanning polyhedra with DO loops

3. Alexander A. Auer Gerald Baumgartner David E. Bernholdt Alina Bibireata Venkatesh Choppella Daniel Cociorva Xiaoyang Gao Robert Harrison Sriram Krishnamoorthy Sandhya Krishnan Chi-Chung Lam Qingda Lu Marcel Nooijen Russell Pitzer J. Ramanujam P. Sadayappan and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104 2 ( 2006 ) 211-228. 10.1080/00268970500275780 Alexander A. Auer Gerald Baumgartner David E. Bernholdt Alina Bibireata Venkatesh Choppella Daniel Cociorva Xiaoyang Gao Robert Harrison Sriram Krishnamoorthy Sandhya Krishnan Chi-Chung Lam Qingda Lu Marcel Nooijen Russell Pitzer J. Ramanujam P. Sadayappan and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104 2 ( 2006 ) 211-228. 10.1080/00268970500275780

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Compilation of Modular and General Sparse Workspaces;Proceedings of the ACM on Programming Languages;2024-06-20

2. Compiling Recurrences over Dense and Sparse Arrays;Proceedings of the ACM on Programming Languages;2024-04-29

3. A Tensor Algebra Compiler for Sparse Differentiation;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02

4. A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20

5. Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11