ZPP: A Dynamic Technique to Eliminate Cache Pollution in NoC based MPSoCs-Reference-Cited by-同舟云学术

ZPP: A Dynamic Technique to Eliminate Cache Pollution in NoC based MPSoCs

Published:2023-09-09 Issue:5s Volume:22 Page:1-25
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Deb Dipika¹^ORCID,Jose John¹^ORCID

Affiliation:

1. Indian Institute of Technology Guwahati, India

Abstract

Data prefetching efficiently reduces the memory access latency in NUCA architectures as the Last Level Cache (LLC) is shared and distributed across multiple cores. But cache pollution generated by prefetcher reduces its efficiency by causing contention for shared resources such as LLC and the underlying network. The paper proposes Zero Pollution Prefetcher (ZPP) that eliminates cache pollution for NUCA architecture. For this purpose, ZPP uses L1 prefetcher and places the prefetched blocks in the data locations of LLC where modified blocks are stored. Since modified blocks in LLC are stale and request for such blocks are served from the exclusively owned private cache, their space unnecessary consumes power to maintain such stale data in the cache. The benefits of ZPP are (a) Eliminates cache pollution in L1 and LLC by storing prefetched blocks in LLC locations where stale blocks are stored. (b) Insufficient cache space is solved by placing prefetched blocks in LLC as LLCs are larger in size than L1 cache. This helps in prefetching more cache blocks, thereby increasing prefetch aggressiveness. (c) Increasing prefetch aggressiveness increases its coverage. (d) It also maintains an equivalent lookup latency to L1 cache for prefetched blocks. Experimentally it has been found that ZPP increases weighted speedup by 2.19x as compared to a system with no prefetching while prefetch coverage and prefetch accuracy increases by 50%, and 12%, respectively compared to the baseline. 1

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3609113

Reference41 articles.

1. N. Agarwal, T. Krishna, Li-Shiuan Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In IEEE International Symposium on Performance Analysis of Systems and Software. 33–42.

2. Cache coherence protocols: evaluation using a multiprocessor simulation model

3. M. Bakhshalipour, P. Lotfi-Kamran, and H. Sarbazi-Azad. 2018. Domino temporal data prefetcher. In International Symposium on High Performance Computer Architecture. 131–142.

4. Networks on chips: a new SoC paradigm

5. Rahul Bera, Anant V. Nori, Onur Mutlu, and Sreenivas Subramoney. 2019. DSPatch: Dual spatial pattern prefetcher. In Microarchitecture. 531–544.