NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN Performance-Reference-Cited by-同舟云学术

NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN Performance

Published:2024-09-11 Issue:6 Volume:23 Page:1-30
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Pandey Shailja¹^ORCID,Panda Preeti Ranjan²^ORCID

Affiliation:

1. Computer Science & Engineering, Indian Institute of Technology Delhi, New Delhi, India

2. Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India

Abstract

Deep neural networks (DNNs) have been widely adopted, owing to break-through performance and high accuracy. DNNs exhibit varying memory behavior involving specific and recognizable memory access patterns and access intensity, depending on the selected data reuse in different layers. Such applications have high memory bandwidth demands due to aggressive computations, performing several billion-floating-point-operations-per-second (BFLOPs). 3D DRAMs, providing very high memory access bandwidth, are extensively employed to break the memory wall , bridging the gap between compute and memory while running DNNs. However, the vertical integration in 3D DRAM introduces serious thermal issues, resulting from high power density and close proximity of memory cells, and requires dynamic thermal management (DTM). To unleash the true potential of 3D DRAM and exploit the enormous bandwidth under thermal constraints, there is a need to intelligently map the DNN application’s data across memory channels, pseudo-channels, and banks, minimizing the effective memory latency and reducing the thermal-induced application slowdown. The specific memory access patterns exhibited by a DNN layer execution are crucial to determine a favorable data mapping method for 3D DRAM dies that potentially causes minimal thermal impact and also maximizes DRAM bandwidth utilization. In this work, we propose an application-aware and thermal-sensitive data mapping that intelligently assigns portions of the 3D DRAM to DNN layers, leveraging the knowledge about layer’s memory access patterns and minimizing DTM-induced performance overheads. Additionally, we also deploy a DRAM low-power states based DTM mechanism to keep the 3D DRAM within safe thermal limits. Using our proposal, we observe a performance improvement of 1% to 61%, and memory energy savings of 1% to 55% for popular DNNs over state-of-the-art DTM strategies while running DNN inference.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3677178

Reference54 articles.

1. Shashank Adavally and Krishna Kavi. 2021. Towards Application-Specific Address Mapping for Emerging Memory Devices. ACM.

2. Demystifying the Characteristics of High Bandwidth Memory for Real-Time Systems

3. Predict and act

4. CADENCE. 2022. PHY IP for HBM2 for Samsung 10LPP. Retrieved from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/ip/design-ip/hbm2-for-samsung-10lpp-br.pdf

5. Memory system characterization of deep learning workloads