SparseP

Author:

Giannoula Christina1,Fernandez Ivan2,Luna Juan Gómez3,Koziris Nectarios4,Goumas Georgios4,Mutlu Onur3

Affiliation:

1. ETH Zürich & National Technical University of Athens, Athens, Greece

2. ETH Zürich & University of Malaga, Malaga, Spain

3. ETH Zürich, Zürich, Switzerland

4. National Technical University of Athens, Athens, Greece

Abstract

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the Sparse Matrix Vector Multiplication (SpMV) kernel. SpMV has been characterized as one of the most significant and thoroughly studied scientific computation kernels. It is primarily a memory-bound kernel with intensive memory accesses due its algorithmic nature, the compressed matrix format used, and the sparsity patterns of the input matrices given. This paper provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We make three key contributions. First, we implement a wide variety of software strategies on SpMV for a multithreaded PIM core, including (1) various compressed matrix formats, (2) load balancing schemes across parallel threads and (3) synchronization approaches, and characterize the computational limits of a single multithreaded PIM core. Second, we design various load balancing schemes across multiple PIM cores, and two types of data partitioning techniques to execute SpMV on thousands of PIM cores: (1) 1D-partitioned kernels to perform the complete SpMV computation only using PIM cores, and (2) 2D-partitioned kernels to strive a balance between computation and data transfer costs to PIM-enabled memory. Third, we compare SpMV execution on a real-world PIM system with 2528 PIM cores to an Intel Xeon CPU and an NVIDIA Tesla V100 GPU to study the performance and energy efficiency of various devices, i.e., both memory-centric PIM systems and conventional processor-centric CPU/GPU systems, for the SpMV kernel. SparseP software package provides 25 SpMV kernels for real PIM systems supporting the four most widely used compressed matrix formats, i.e., CSR, COO, BCSR and BCOO, and a wide range of data types. SparseP is publicly and freely available at https://github.com/CMU-SAFARI/SparseP. Our extensive evaluation using 26 matrices with various sparsity patterns provides new insights and recommendations for software designers and hardware architects to efficiently accelerate the SpMV kernel on real PIM systems.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Reference122 articles.

1. Junwhan Ahn Sungpack Hong Sungjoo Yoo Onur Mutlu and Kiyoung Choi. 2015. A Scalable Processing-In-Memory Accelerator for Parallel Graph Processing. In ISCA . Junwhan Ahn Sungpack Hong Sungjoo Yoo Onur Mutlu and Kiyoung Choi. 2015. A Scalable Processing-In-Memory Accelerator for Parallel Graph Processing. In ISCA .

2. Bahar Asgari Ramyad Hadidi Joshua Dierberger Charlotte Steinichen and Hyesoon Kim. 2020 a. Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads. In CoRR . https://arxiv.org/abs/2011.10932 Bahar Asgari Ramyad Hadidi Joshua Dierberger Charlotte Steinichen and Hyesoon Kim. 2020 a. Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads. In CoRR . https://arxiv.org/abs/2011.10932

3. Bahar Asgari Ramyad Hadidi Tushar Krishna Hyesoon Kim and Sudhakar Yalamanchili. 2020 b. ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator. In HPCA . Bahar Asgari Ramyad Hadidi Tushar Krishna Hyesoon Kim and Sudhakar Yalamanchili. 2020 b. ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator. In HPCA .

4. Hadi Asghari-Moghaddam , Young Hoon Son , Jung Ho Ahn, and Nam Sung Kim. 2016 . Chameleon : Versatile and PracticalNear-DRAM Acceleration Architecture for Large Memory Systems. In MICRO . Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and PracticalNear-DRAM Acceleration Architecture for Large Memory Systems. In MICRO .

5. Mehmet Belgin , Godmar Back , and Calvin J . Ribbens . 2009 . Pattern-Based Sparse Matrix Representation for Memory-Efficient SMVM Kernels. In ICS . Mehmet Belgin, Godmar Back, and Calvin J. Ribbens. 2009. Pattern-Based Sparse Matrix Representation for Memory-Efficient SMVM Kernels. In ICS .

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Design principles for lifelong learning AI accelerators;Nature Electronics;2023-11-16

2. HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

3. MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in Memory;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

4. SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory;2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT);2023-10-21

5. Dynamic Partitioning Method for Near-Memory Parallel Processing of Sparse Matrix-Vector Multiplication;IECON 2023- 49th Annual Conference of the IEEE Industrial Electronics Society;2023-10-16

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3