Affiliation:
1. Universidad de Cantabria, Spain
Abstract
This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the access pattern. Each L2 bank is dynamically partitioned at set level in private and shared content. Simply by adjusting the replacement algorithm, we can place private data closer to its owner processor. In contrast, independently of the accessing processor, shared data is always placed in the same position. This approach is capable of reducing on-chip latency without significantly sacrificing hit rates or increasing implementation cost of a conventional static NUCA. Additionally, most of the unnecessary interference between cores in private accesses is removed.
To support the architectural decisions adopted and provide a comparative study, a comprehensive evaluation framework is employed. The workbench is composed of a full system simulator, and a representative set of multithreaded and multiprogrammed workloads. With this infrastructure, different alternatives for the coherence protocol, replacement policies, and cache utilization are analyzed to find the optimal proposal. We conclude that the cost for a feasible implementation should be closer to a conventional static NUCA, and significantly less than a dynamic NUCA.
Finally, a comparison with static and dynamic NUCA is presented. The simulation results suggest that on average the mechanism proposed could improve system performance of a static NUCA and idealized dynamic NUCA by 16% and 6% respectively.
Funder
Ministerio de Educación, Cultura y Deporte
Publisher
Association for Computing Machinery (ACM)
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Cache Memory and On-Chip Cache Architecture: A Survey;Communications in Computer and Information Science;2024
2. Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations;2023 Design, Automation & Test in Europe Conference & Exhibition (DATE);2023-04
3. TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming Models;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11
4. DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering;2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP);2022-03
5. Compiler support for near data computing;Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2021-02-17