Cooperative Caching for Chip Multiprocessors-Reference-Cited by-同舟云学术

Cooperative Caching for Chip Multiprocessors

Published:2006-05 Issue:2 Volume:34 Page:264-276
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Chang Jichuan¹,Sohi Gurindar S.¹

Affiliation:

1. University of Wisconsin-Madison

Abstract

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper. Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1150019.1136509

Reference34 articles.

1. Simulating a $2M commercial server on a $2K PC

2. A cache coherence approach for large multiprocessor systems

3. Piranha

Cited by 98 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. NUBA: Non-Uniform Bandwidth GPUs;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27

2. Enterprise-Class Multilevel Cache Design: Low Latency, Huge Capacity, and High Reliability;IEEE Micro;2023-01-01

3. Designing Virtual Memory System of MCM GPUs;2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO);2022-10

4. Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources;2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO);2022-10

5. Coherency Traffic Reduction in Manycore Systems;2022 25th Euromicro Conference on Digital System Design (DSD);2022-08