Affiliation:
1. ETH Zurich, Zurich, Switzerland
2. ETH Zurich, Zurich Switzerland
3. TOBB University of Economics and Technology, Ankara Turkey
Abstract
Modern computing systems access data in main memory at coarse granularity (e.g., at 512-bit cache block granularity). Coarse-grained access leads to wasted energy because the system does not use all individually accessed small portions (e.g.,
words
, each of which typically is 64 bits) of a cache block. In modern DRAM-based computing systems, two key coarse-grained access mechanisms lead to wasted energy: large and fixed-size (i) data transfers between DRAM and the memory controller and (ii) DRAM row activations.
We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfer and DRAM row activation. To retrieve only useful data from DRAM, Sectored DRAM exploits the observation that many cache blocks are not fully utilized in many workloads due to poor spatial locality. Sectored DRAM predicts the words in a cache block that will likely be accessed during the cache block’s residency in cache and (i) transfers only the predicted words on the memory channel by dynamically tailoring the DRAM data transfer size for the workload and (ii) activates a smaller set of cells that contain the predicted words by carefully operating physically isolated portions of DRAM rows (i.e., mats). Activating a smaller set of cells on each access relaxes DRAM power delivery constraints and allows the memory controller to schedule DRAM accesses faster.
We evaluate Sectored DRAM using 41 workloads from widely used benchmark suites. Compared to a system with coarse-grained DRAM, Sectored DRAM reduces the DRAM energy consumption of highly memory intensive workloads by up to (on average) 33% (20%) while improving their performance by up to (on average) 36% (17%). Sectored DRAM’s DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23%. Sectored DRAM’s DRAM chip area overhead is 1.7% of the area of a modern DDR4 chip. Compared to state-of-the-art fine-grained DRAM architectures, Sectored DRAM greatly reduces DRAM energy consumption, does not reduce DRAM bandwidth, and can be implemented with low hardware cost. Sectored DRAM provides 89% of the performance benefits of, consumes 12% less DRAM energy than, and takes up 34% less DRAM chip area than a high-performance state-of-the-art fine-grained DRAM architecture (Half-DRAM). It is our hope and belief that Sectored DRAM’s ideas and results will help to enable more efficient and high-performance memory systems. To this end, we open source Sectored DRAM at https://github.com/CMU-SAFARI/Sectored-DRAM.
Publisher
Association for Computing Machinery (ACM)
Reference137 articles.
1. Jung Ho Ahn et al. 2009. Future scaling of processor-memory interfaces. In SC.
2. Jung Ho Ahn et al. 2009. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters 8 1 (2009) 5–8.
3. Tareq Alawneh, Raimund Kirner, and Catherine Menon. 2021. Dynamic row activation mechanism for multi-core systems. In CF.
4. D. B. Alpert et al. 1988. Performance trade-offs for microprocessor cache memories. IEEE Micro 8 4 (1988) 44–54.
5. AMD. 2013. BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors. AMD.