High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics-Reference-Cited by-同舟云学术

High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics

Published:2022-01-31 Issue:1 Volume:18 Page:1-19
ISSN:1550-4832
Container-title:ACM Journal on Emerging Technologies in Computing Systems
language:en
Short-container-title:J. Emerg. Technol. Comput. Syst.

Author:

Choudhury Dwaipayan¹,Rajam Aravind Sukumaran¹,Kalyanaraman Ananth¹,Pande Partha Pratim¹

Affiliation:

1. Washington State University, Pullman, WA

Abstract

Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM.

Funder

US National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3482880

Reference48 articles.

1. Scale-Free Networks

2. K. Duraisamy H. Lu P. Pande and A. Kalyanaraman. 2016. High-performance and energy-efficient network-on-chip architectures for graph analytics. ACM Transactions on Embedded Computing Systems 66. DOI:https://doi.org/10.1145/2961027 K. Duraisamy H. Lu P. Pande and A. Kalyanaraman. 2016. High-performance and energy-efficient network-on-chip architectures for graph analytics. ACM Transactions on Embedded Computing Systems 66. DOI:https://doi.org/10.1145/2961027

3. Near-Data Processing: Insights from a MICRO-46 Workshop

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accelerating Graph Computations on 3D NoC-Enabled PIM Architectures;ACM Transactions on Design Automation of Electronic Systems;2023-03-19