Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures-Reference-Cited by-同舟云学术

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

Published:2016-01-04 Issue:3S Volume:43 Page:720-732
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Alvarez Lluc¹,Vilanova Lluís¹,Moreto Miquel²,Casas Marc²,Gonzàlez Marc³,Martorell Xavier¹,Navarro Nacho¹,Ayguadé Eduard¹,Valero Mateo¹

Affiliation:

1. Barcelona Supercomputing Center and Universitat Politècnica de Catalunya

2. Barcelona Supercomputing Center

3. Universitat Politècnica de Catalunya

Abstract

The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a hybrid memory system. Scratchpad memories are more power-efficient than caches and they do not generate coherence traffic, but they suffer from poor programmability. A good way to hide the programmability difficulties to the programmer is to give the compiler the responsibility of generating code to manage the scratchpad memories. Unfortunately, compilers do not succeed in generating this code in the presence of random memory accesses with unknown aliasing hazards. This paper proposes a coherence protocol for the hybrid memory system that allows the compiler to always generate code to manage the scratchpad memories. In coordination with the compiler, memory accesses that may access stale copies of data are identified and diverted to the valid copy of the data. The proposal allows the architecture to be exposed to the programmer as a shared memory manycore, maintaining the programming simplicity of shared memory models and preserving backwards compatibility. In a 64-core manycore, the coherence protocol adds overheads of 4% in performance, 8% in network traffic and 9% in energy consumption to enable the usage of the hybrid memory system that, compared to a cache-based system, achieves a speedup of 1.14x and reduces on-chip network traffic and energy consumption by 29% and 17%, respectively.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750411

Reference44 articles.

1. Intel 64 and IA-32 Architectures Software Developer's Manual 2011. Intel 64 and IA-32 Architectures Software Developer's Manual 2011.

2. NVIDIA CUDA C Programming Guide. Version 4.2 2012. NVIDIA CUDA C Programming Guide. Version 4.2 2012.

3. Memory models

4. Spatiotemporal Coherence Tracking

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of Intra Cluster Access Structure for Distributed Caches of Array Processor;2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA);2022-01

2. Exploiting memory allocations in clusterised many‐core architectures;IET Computers & Digital Techniques;2019-04-24

3. Runtime-Aware Architectures;Lecture Notes in Computer Science;2015