Optimizing Replication, Communication, and Capacity Allocation in CMPs-Reference-Cited by-同舟云学术

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Published:2005-05 Issue:2 Volume:33 Page:357-368
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Chishti Zeshan¹,Powell Michael D.¹,Vijaykumar T. N.¹

Affiliation:

1. Purdue University

Abstract

Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy while requiring fast access. Neither private nor shared caches can provide both large capacity and fast access in CMPs. We observe that compared to symmetric multiprocessors (SMPs), CMPs change the latency-capacity tradeoff in two significant ways. We propose three novel ideas to exploit the changes: (1) Though placing copies close to requestors allows fast access for read-only sharing, the copies also reduce the already-limited on-chip capacity in CMPs. We propose controlled replication to reduce capacity pressure by not making extra copies in some cases, and obtaining the data from an existing on-chip copy. This option is not suitable for SMPs because obtaining data from another processor is expensive and capacity is not limited to on-chip storage. (2) Unlike SMPs, CMPs allow fast on-chip communication between processors for read-write sharing. Instead of incurring slow access to read-write shared data through coherence misses as do SMPs, we propose in-situ communication to provide fast access without making copies or incurring coherence misses. (3) Accessing neighborsý caches is not as expensive in CMPs as it is in SMPs. We propose capacity stealing in which private data that exceeds a coreýs capacity is placed in a neighboring cache with less capacity demand. To incorporate our ideas, we use a hybrid of private, per-processor tag arrays and a shared data array. Because the shared data array is slow, we employ non-uniform access and distance associativity from previous proposals to hold frequently-accessed data in regions close to the requestor. We extend the previously-proposed Non-uniform access with Replacement And Placement usIng Distance associativity (NuRAPID) to CMPs, and call our cache CMP-NuRAPID. Our results show that for a 4-core CMP with 8 MB cache, CMP-NuRAPID improves performance by 13% over a shared cache and 8% over private caches for three commercial multithreaded workloads.

Publisher

Association for Computing Machinery (ACM)

Reference30 articles.

1. Variability in architectural simulations of multi-threaded workloads;Alameldeen A. R.;HPCA,2003

2. Generating representative Web workloads for network and server performance evaluation

3. Piranha

4. Managing Wire Delay in Large Chip-Multiprocessor Caches

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Low-Level Virtual Machine Just-In-Time Prototype for Running an Energy-Saving Hardware-Aware Mapping Algorithm on C/C++ Applications That Use Pthreads;Energies;2023-09-23

2. Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit;ACM Transactions on Modeling and Performance Evaluation of Computing Systems;2020-12-31

3. An efficient cache flat storage organization for multithreaded workloads for low power processors;Future Generation Computer Systems;2020-09

4. Cache Memory Architectures for Handling Big Data Applications: A Survey;Advances in Intelligent Systems and Computing;2019-12-01

5. FOS: a low-power cache organization for multicores;The Journal of Supercomputing;2019-04-24