Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication-Reference-Cited by-同舟云学术

Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

Published:2018-04-27 Issue:3 Volume:4 Page:1-34
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Akbudak Kadir¹,Selvitopi Oguz¹,Aykanat Cevdet¹

Affiliation:

1. Bilkent University, Ankara, Turkey

Abstract

We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. We also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.

Funder

the Scientific and Technological Research Council of Turkey

European Cooperation in Science and Technology

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modelling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3155292

Reference50 articles.

1. Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication

2. Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures

3. Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

4. Parallel Triangle Counting and Enumeration Using Matrix Algebra

5. Communication optimal parallel multiplication of sparse random matrices

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning-Based Kernel Selector for SpMV Optimization in Graph Analysis;ACM Transactions on Parallel Computing;2024-06-08

2. SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU;Parallel Processing Letters;2024-05-27

3. HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

4. HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors;Proceedings of the 52nd International Conference on Parallel Processing;2023-08-07

5. Efficient Execution of SpGEMM on Long Vector Architectures;Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing;2023-08-07