A scalable processing-in-memory accelerator for parallel graph processing

Author:

Ahn Junwhan1,Hong Sungpack2,Yoo Sungjoo1,Mutlu Onur3,Choi Kiyoung1

Affiliation:

1. Seoul National University

2. Oracle Labs

3. Carnegie Mellon University

Abstract

The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

Funder

National Science Foundation

National Research Foundation of Korea

Ministry of Knowledge Economy

Publisher

Association for Computing Machinery (ACM)

Reference63 articles.

1. ARM Cortex-A5 Processor. Available: http://www.arm.com/products/processors/cortex-a/cortex-a5.php ARM Cortex-A5 Processor. Available: http://www.arm.com/products/processors/cortex-a/cortex-a5.php

2. Near-Data Processing: Insights from a MICRO-46 Workshop

3. Efficient virtual memory for big memory servers

4. Implementing remote procedure calls

Cited by 67 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Kernel Shape Control for Row-Efficient Convolution on Processing-In-Memory Arrays;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28

2. OpenFAM: Programming disaggregated memory;Concurrency and Computation: Practice and Experience;2023-09-13

3. GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs;ACM Transactions on Reconfigurable Technology and Systems;2023-09-13

4. Analogue Artificial Synaptic Performance of Self‐Rectifying Resistive Switching Device;Advanced Electronic Materials;2023-06-21

5. CommonGraph: Graph Analytics on Evolving Data;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3