A scalable processing-in-memory accelerator for parallel graph processing-Reference-Cited by-同舟云学术

A scalable processing-in-memory accelerator for parallel graph processing

Published:2016-01-04 Issue:3S Volume:43 Page:105-117
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Ahn Junwhan¹,Hong Sungpack²,Yoo Sungjoo¹,Mutlu Onur³,Choi Kiyoung¹

Affiliation:

1. Seoul National University

2. Oracle Labs

3. Carnegie Mellon University

Abstract

The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

Funder

National Science Foundation

National Research Foundation of Korea

Ministry of Knowledge Economy

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750386

Reference63 articles.

1. ARM Cortex-A5 Processor. Available: http://www.arm.com/products/processors/cortex-a/cortex-a5.php ARM Cortex-A5 Processor. Available: http://www.arm.com/products/processors/cortex-a/cortex-a5.php

2. Near-Data Processing: Insights from a MICRO-46 Workshop

3. Efficient virtual memory for big memory servers

4. Implementing remote procedure calls

Cited by 75 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis;2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2024-06-24

3. A³PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader;2024 Design, Automation & Test in Europe Conference & Exhibition (DATE);2024-03-25

4. Hardware for Deep Learning Acceleration;Advanced Intelligent Systems;2024-03-21

5. Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02