Optimizing off-chip accesses in multicores-Reference-Cited by-同舟云学术

Optimizing off-chip accesses in multicores

Published:2015-08-07 Issue:6 Volume:50 Page:131-142
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Ding Wei¹,Tang Xulong¹,Kandemir Mahmut¹,Zhang Yuanrui²,Kultursay Emre¹

Affiliation:

1. Pennsylvania State University, USA

2. Intel, USA

Abstract

In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2813885.2737989

Reference33 articles.

1. Networks on Chip

2. Replacement techniques for dynamic NUCA cache designs on CMPs

3. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches

4. Managing Wire Delay in Large Chip-Multiprocessor Caches

5. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors