BEAR-Reference-Cited by-同舟云学术

BEAR

Published:2016-01-04 Issue:3S Volume:43 Page:198-210
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Chou Chiachen¹,Jaleel Aamer²,Qureshi Moinuddin K.¹

Affiliation:

1. Georgia Institute of Technology

2. NVIDIA

Abstract

Die stacking memory technology can enable gigascale DRAM caches that can operate at 4x-8x higher bandwidth than commodity DRAM. Such caches can improve system performance by servicing data at a faster rate when the requested data is found in the cache, potentially increasing the memory bandwidth of the system by 4x-8x. Unfortunately, a DRAM cache uses the available memory bandwidth not only for data transfer on cache hits, but also for other secondary operations such as cache miss detection, fill on cache miss, and writeback lookup and content update on dirty evictions from the last-level on-chip cache. Ideally, we want the bandwidth consumed for such secondary operations to be negligible, and have almost all the bandwidth be available for transfer of useful data from the DRAM cache to the processor. We evaluate a 1GB DRAM cache, architected as Alloy Cache, and show that even the most bandwidth-efficient proposal for DRAM cache consumes 3.8x bandwidth compared to an idealized DRAM cache that does not consume any bandwidth for secondary operations. We also show that redesigning the DRAM cache to minimize the bandwidth consumed by secondary operations can potentially improve system performance by 22%. To that end, this paper proposes Bandwidth Efficient ARchitecture (BEAR) for DRAM caches. BEAR integrates three components, one each for reducing the bandwidth consumed by miss detection, miss fill, and writeback probes. BEAR reduces the bandwidth consumption of DRAM cache by 32%, which reduces cache hit latency by 24% and increases overall system performance by 10%. BEAR, with negligible overhead, outperforms an idealized SRAM Tag-Store design that incurs an unacceptable overhead of 64 megabytes, as well as Sector Cache designs that incur an SRAM storage overhead of 6 megabytes.

Funder

Defense Advanced Research Projects Agency

Semiconductor Research Corporation

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750387

Reference29 articles.

1. HMC Specification 1.0 2013. {Online}. Available: http://www.hybridmemorycube.org HMC Specification 1.0 2013. {Online}. Available: http://www.hybridmemorycube.org

2. Micron HMC Gen2 Micron 2013. Micron HMC Gen2 Micron 2013.

3. DDR4 SPEC (JESD79-4) JEDEC 2013. DDR4 SPEC (JESD79-4) JEDEC 2013.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2023-03-25

2. Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme;Science China Information Sciences;2017-07-14