Abstract
Amdahl's law provides architects a compelling reason to introduce system asymmetry to optimize for both serial and parallel regions of execution. Asymmetry in a multicore processor can arise statically (e.g., from core microarchitecture) or dynamically (e.g., applying dynamic voltage/frequency scaling). Work stealing is an increasingly popular approach to task distribution that elegantly balances task-based parallelism across multiple worker threads. In this paper, we propose asymmetry-aware work-stealing (AAWS) runtimes, which are carefully designed to exploit both the static and dynamic asymmetry in modern systems. AAWS runtimes use three key hardware/software techniques: work-pacing, work-sprinting, and work-mugging. Work-pacing and work-sprinting are novel techniques that combine a marginal-utility-based approach with integrated voltage regulators to improve performance and energy efficiency in high- and low-parallel regions. Work-mugging is a previously proposed technique that enables a waiting big core to preemptively migrate work from a busy little core. We propose a simple implementation of work-mugging based on lightweight user-level interrupts. We use a vertically integrated research methodology spanning software, architecture, and VLSI to make the case that holistically combining static asymmetry, dynamic asymmetry, and work-stealing runtimes can improve both performance and energy efficiency in future multicore systems.
Publisher
Association for Computing Machinery (ACM)
Reference64 articles.
1. A. Annamalai etal An Opportunistic Prediction-Based Thread Scheduling to Maximize Throughput/Watt in AMPs. Int'l Conf. on Parallel Architectures and Compilation Techniques Sep 2013. A. Annamalai et al. An Opportunistic Prediction-Based Thread Scheduling to Maximize Throughput/Watt in AMPs. Int'l Conf. on Parallel Architectures and Compilation Techniques Sep 2013.
2. Energy-performance tradeoffs in processor architecture and circuit design
3. Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk
4. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
5. The PARSEC benchmark suite
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献