Author:
Ballard G.,Carson E.,Demmel J.,Hoemmen M.,Knight N.,Schwartz O.
Abstract
The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical (O(n3)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.
Publisher
Cambridge University Press (CUP)
Subject
General Mathematics,Numerical Analysis
Cited by
67 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation;SIAM Journal on Matrix Analysis and Applications;2024-02-06
2. Optimizing Multi-grid Computation and Parallelization on Multi-cores;Proceedings of the 37th International Conference on Supercomputing;2023-06-21
3. Parallel Memory-Independent Communication Bounds for SYRK;Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures;2023-06-17
4. Error-bounded Scalable Parallel Tensor Train Decomposition;2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2023-05
5. GMRES algorithms over 35 years;Applied Mathematics and Computation;2023-05