fgSpMSpV: A Fine-grained Parallel SpMSpV Framework on HPC Platforms-Reference-Cited by-同舟云学术

fgSpMSpV: A Fine-grained Parallel SpMSpV Framework on HPC Platforms

Published:2022-04-11 Issue:2 Volume:9 Page:1-29
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Chen Yuedan¹^ORCID,Xiao Guoqing¹^ORCID,Li Kenli¹^ORCID,Piccialli Francesco²^ORCID,Zomaya Albert Y.³^ORCID

Affiliation:

1. College of Computer Science and Electronic Engineering, Hunan University, and National Supercomputing Center in Changsha, Changsha, Hunan, China

2. Department of Electrical Engineering and Information Technologies,University of Naples Federico II, Naples, Italy

3. School of Information Technologies, University of Sydney, Sidney, NSW, Australia

Abstract

Sparse matrix-sparse vector (SpMSpV) multiplication is one of the fundamental and important operations in many high-performance scientific and engineering applications. The inherent irregularity and poor data locality lead to two main challenges to scaling SpMSpV over high-performance computing (HPC) systems: (i) a large amount of redundant data limits the utilization of bandwidth and parallel resources; (ii) the irregular access pattern limits the exploitation of computing resources. This paper proposes a fine-grained parallel SpMSpV ( fgSpMSpV ) framework on Sunway TaihuLight supercomputer to alleviate the challenges for large-scale real-world applications. First, fgSpMSpV adopts an MPI

\( + \)

OpenMP

\( +X \)

parallelization model to exploit the multi-stage and hybrid parallelism of heterogeneous HPC architectures and accelerate both pre-/post-processing and main SpMSpV computation. Second, fgSpMSpV utilizes an adaptive parallel execution to reduce the pre-processing, adapt to the parallelism and memory hierarchy of the Sunway system, while still tame redundant and random memory accesses in SpMSpV, including a set of techniques like the fine-grained partitioner, re-collection method, and Compressed Sparse Column Vector (CSCV) matrix format. Third, fgSpMSpV uses several optimization techniques to further utilize the computing resources. fgSpMSpV on the Sunway TaihuLight gains a noticeable performance improvement from the key optimization techniques with various sparsity of the input. Additionally, fgSpMSpV is implemented on an NVIDIA Tesal P100 GPU and applied to the breath-first-search (BFS) application. fgSpMSpV on a P100 GPU obtains the speedup of up to

\( 134.38\times \)

over the state-of-the-art SpMSpV algorithms, and the BFS application using fgSpMSpV achieves the speedup of up to

\( 21.68\times \)

over the state-of-the-arts.

Funder

National Key R&D Programs of China

Programs of National Natural Science Foundation of China

Programs of Hunan Province, China

Programs of China Postdoctoral Council

Program of Zhejiang Lab

General Program of Fundamental Research of Shen Zhen

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3512770

Reference42 articles.

1. Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs

2. Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures

3. Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

4. Michael J. Anderson Narayanan Sundaram Nadathur Satish Md. Mostofa Ali Patwary Theodore L. Willke and Pradeep Dubey. 2016. GraphPad: Optimized graph primitives for parallel and distributed platforms. In Proceedings of the International Parallel and Distributed Processing Symposium . 313–322.

5. Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning-Based Kernel Selector for SpMV Optimization in Graph Analysis;ACM Transactions on Parallel Computing;2024-06-08

2. Redesign and Accelerate the AIREBO Bond-Order Potential on the New Sunway Supercomputer;IEEE Transactions on Parallel and Distributed Systems;2023-12

3. ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight;Parallel Computing;2023-09

4. A Survey of Accelerating Parallel Sparse Linear Algebra;ACM Computing Surveys;2023-08-28

5. A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN;ACM Transactions on Parallel Computing;2023-06-20