Affiliation:
1. Koç University, Istanbul, Turkey
2. Scalable Machines Research, USA
Abstract
One widely used metric that measures data locality is
reuse distance
—the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in parallel applications rely on simulators or binary instrumentation tools that incur large performance and memory overheads. Moreover, the existing sampling-based tools are limited to measuring reuse distances of a single thread and discard interactions among threads in multi-threaded programs. In this work, we propose
ReuseTracker
—a fast and accurate reuse distance analyzer that leverages existing hardware features in commodity CPUs.
ReuseTracker
is designed for multi-threaded programs and takes cache-coherence effects into account. By utilizing hardware features like performance monitoring units and debug registers,
ReuseTracker
can accurately profile reuse distance in parallel applications with much lower overheads than existing tools. It introduces only 2.9× runtime and 2.8× memory overheads. Our tool achieves 92% accuracy when verified against a newly developed configurable benchmark that can generate a variety of different reuse distance patterns. We demonstrate the tool’s functionality with two use-case scenarios using PARSEC, Rodinia, and Synchrobench benchmark suites where
ReuseTracker
guides code refactoring in these benchmarks by detecting spatial reuses in shared caches that are also false sharing and successfully predicts whether some benchmarks in these suites can benefit from adjacent cache line prefetch optimization.
Funder
Scientific and Technological Research Council of Turkey
Royal Society-Newton Advanced Fellowship
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference65 articles.
1. dcompiler/loca: Program Locality Analysis Tools;GitHub.;Retrieved on 20 July, 2020 from https://github.com/dcompiler/loca
2. Harmonic Progression;Wikipedia.;Retrieved on 12 January, 2021 from https://en.wikipedia.org/wiki/Harmonic_progression_(mathematics)
3. Thread Affinity Interface (Linux* and Windows*);Intel.;R
4. PPT-SASMM: Scalable Analytical Shared Memory Model
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献