Affiliation:
1. IBM T.J. Watson Research Center, Emeritus, and Umeå University
2. Umeå University
Abstract
Techniques and algorithms for efficient in-place conversion to and from standard and blocked matrix storage formats are described. Such functionality is required by numerical libraries that use different data layouts internally. Parallel algorithms and a software package for in-place matrix storage format conversion based on in-place matrix transposition are presented and evaluated. A new algorithm for in-place transposition which efficiently determines the structure of the transposition permutation a priori is one of the key ingredients. It enables effective load balancing in a parallel environment.
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. AMT: asynchronous in-place matrix transpose mechanism for sunway many-core processor;The Journal of Supercomputing;2022-01-17
2. Efficient Out-of-Core and Out-of-Place Rectangular Matrix Transposition and Rotation;IEEE Transactions on Computers;2021-11-01
3. Highly efficient GPU eigensolver for three-dimensional photonic crystal band structures with any Bravais lattice;Computer Physics Communications;2019-12
4. SLATE;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2019-11-17
5. PLASMA;ACM Transactions on Mathematical Software;2019-06-30