Affiliation:
1. Department of CSE, Indian Institute of Technology Delhi, India
Abstract
Deep neural network (DNN) implementations are typically characterized by huge datasets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under-utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling.
We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed
NeuroCool
DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test datasets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.
Funder
Semiconductor Research Corporation
Publisher
Association for Computing Machinery (ACM)
Subject
Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献