NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN Performance

Author:

Pandey Shailja1ORCID,Panda Preeti Ranjan2ORCID

Affiliation:

1. Computer Science & Engineering, Indian Institute of Technology Delhi, New Delhi, India

2. Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India

Abstract

Deep neural networks (DNNs) have been widely adopted, owing to break-through performance and high accuracy. DNNs exhibit varying memory behavior involving specific and recognizable memory access patterns and access intensity, depending on the selected data reuse in different layers. Such applications have high memory bandwidth demands due to aggressive computations, performing several billion-floating-point-operations-per-second (BFLOPs). 3D DRAMs, providing very high memory access bandwidth, are extensively employed to break the memory wall , bridging the gap between compute and memory while running DNNs. However, the vertical integration in 3D DRAM introduces serious thermal issues, resulting from high power density and close proximity of memory cells, and requires dynamic thermal management (DTM). To unleash the true potential of 3D DRAM and exploit the enormous bandwidth under thermal constraints, there is a need to intelligently map the DNN application’s data across memory channels, pseudo-channels, and banks, minimizing the effective memory latency and reducing the thermal-induced application slowdown. The specific memory access patterns exhibited by a DNN layer execution are crucial to determine a favorable data mapping method for 3D DRAM dies that potentially causes minimal thermal impact and also maximizes DRAM bandwidth utilization. In this work, we propose an application-aware and thermal-sensitive data mapping that intelligently assigns portions of the 3D DRAM to DNN layers, leveraging the knowledge about layer’s memory access patterns and minimizing DTM-induced performance overheads. Additionally, we also deploy a DRAM low-power states based DTM mechanism to keep the 3D DRAM within safe thermal limits. Using our proposal, we observe a performance improvement of 1% to 61%, and memory energy savings of 1% to 55% for popular DNNs over state-of-the-art DTM strategies while running DNN inference.

Publisher

Association for Computing Machinery (ACM)

Reference54 articles.

1. Shashank Adavally and Krishna Kavi. 2021. Towards Application-Specific Address Mapping for Emerging Memory Devices. ACM.

2. Demystifying the Characteristics of High Bandwidth Memory for Real-Time Systems

3. Predict and act

4. CADENCE. 2022. PHY IP for HBM2 for Samsung 10LPP. Retrieved from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/ip/design-ip/hbm2-for-samsung-10lpp-br.pdf

5. Memory system characterization of deep learning workloads

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3