Automatic Sublining for Efficient Sparse Memory Accesses-Reference-Cited by-同舟云学术

Automatic Sublining for Efficient Sparse Memory Accesses

Published:2021-09-30 Issue:3 Volume:18 Page:1-23
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Heirman Wim¹^ORCID,Eyerman Stijn¹,Bois Kristof Du¹,Hur Ibrahim²

Affiliation:

1. Intel Corporation, Belgium

2. Intel Corporation, USA

Abstract

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8 B) from main memory (subline access), can solve these issues. Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal. We propose the Instruction Spatial Locality Estimator ( ISLE ), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3452141

Reference43 articles.

1. Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR/2010.06277 Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR/2010.06277

2. Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM. Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM.

3. An Event-Triggered Programmable Prefetcher for Irregular Workloads

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The First Direct Mesh-to-Mesh Photonic Fabric;IEEE Micro;2024-05

2. McCore: A Holistic Management of High-Performance Heterogeneous Multicores;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

3. The Intel Programmable and Integrated Unified Memory Architecture Graph Analytics Processor;IEEE Micro;2023-09