Affiliation:
1. Intel Corporation, Belgium
2. Intel Corporation, USA
Abstract
Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8 B) from main memory (subline access), can solve these issues.
Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal.
We propose the
Instruction Spatial Locality Estimator
(
ISLE
), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference43 articles.
1. Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR/2010.06277 Sriram Aananthakrishnan Nesreen K. Ahmed Vincent Cave Marcelo Cintra Yigit Demir Kristof Du Bois Stijn Eyerman Joshua B. Fryman Ivan Ganev Wim Heirman Hans-Christian Hoppe Jason Howard Ibrahim Hur MidhunChandra Kodiyath Samkit Jain Daniel S. Klowden Marek M. Landowski Laurent Montigny Ankit More Przemyslaw Ossowski Robert Pawlowski Nick Pepperling Fabrizio Petrini Mariusz Sikora Balasubramanian Seshasayee Shaden Smith Sebastian Szkoda Sanjaya Tayal2020. PIUMA: Programmable Integrated Unified Memory Architecture. arxiv:cs.AR/2010.06277
2. Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM. Advanced Micro Devices Inc.2013. High Bandwidth Memory (HBM) DRAM.
3. An Event-Triggered Programmable Prefetcher for Irregular Workloads
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献