SP-NUCA-Reference-Cited by-同舟云学术

SP-NUCA

Published:2008-05 Issue:2 Volume:36 Page:64-71
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Merino Javier¹,Puente Valentín¹,Prieto Pablo¹,Gregorio José Ángel¹

Affiliation:

1. Universidad de Cantabria, Spain

Abstract

This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the access pattern. Each L2 bank is dynamically partitioned at set level in private and shared content. Simply by adjusting the replacement algorithm, we can place private data closer to its owner processor. In contrast, independently of the accessing processor, shared data is always placed in the same position. This approach is capable of reducing on-chip latency without significantly sacrificing hit rates or increasing implementation cost of a conventional static NUCA. Additionally, most of the unnecessary interference between cores in private accesses is removed. To support the architectural decisions adopted and provide a comparative study, a comprehensive evaluation framework is employed. The workbench is composed of a full system simulator, and a representative set of multithreaded and multiprogrammed workloads. With this infrastructure, different alternatives for the coherence protocol, replacement policies, and cache utilization are analyzed to find the optimal proposal. We conclude that the cost for a feasible implementation should be closer to a conventional static NUCA, and significantly less than a dynamic NUCA. Finally, a comparison with static and dynamic NUCA is presented. The simulation results suggest that on average the mechanism proposed could improve system performance of a static NUCA and idealized dynamic NUCA by 16% and 6% respectively.

Funder

Ministerio de Educación, Cultura y Deporte

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1399972.1399973

Reference25 articles.

1. Managing Wire Delay in Large Chip-Multiprocessor Caches

2. ASR: Adaptive Selective Replication for CMP Caches

3. Cooperative Caching for Chip Multiprocessors

4. Optimizing Replication, Communication, and Capacity Allocation in CMPs

5. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cache Memory and On-Chip Cache Architecture: A Survey;Communications in Computer and Information Science;2024

2. Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04

3. TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming Models;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11

4. DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering;2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP);2022-03

5. Compiler support for near data computing;Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2021-02-17