Affiliation:
1. University of Maryland, College Park, MD
Abstract
As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads used as they run, enable sophisticated and flexible resource management. Although many existing applications parallelized for SMPs with parallel runtimes are in fact already malleable, deployed runtime environments provide no interface nor any strategy for intelligently allocating hardware threads or even preventing oversubscription. Prior research methods either depend on profiling applications ahead of time to make good decisions about allocations or do not account for process efficiency at all, leading to poor performance. None of these prior methods have been adapted widely in practice. This article presents the Scheduling and Allocation with Feedback (SCAF) system: a drop-in runtime solution that supports existing malleable applications in making intelligent allocation decisions based on observed efficiency without any changes to semantics, program modification, offline profiling, or even recompilation. Our existing implementation can control most unmodified OpenMP applications. Other malleable threading libraries can also easily be supported with small modifications without requiring application modification or recompilation.
In this work, we present the SCAF daemon and a SCAF-aware port of the GNU OpenMP runtime. We present a new technique for estimating process efficiency purely at runtime using available hardware counters and demonstrate its effectiveness in aiding allocation decisions.
We evaluated SCAF using NAS NPB parallel benchmarks on five commodity parallel platforms, enumerating architectural features and their effects on our scheme. We measured the benefit of SCAF in terms of sum of speedups improvement (a common metric for multiprogrammed environments) when running all benchmark pairs concurrently compared to equipartitioning—the best existing competing scheme in the literature. We found that SCAF improves on equipartitioning on four out of five machines, showing a mean improvement factor in sum of speedups of 1.04 to 1.11x for benchmark pairs, depending on the machine, and 1.09x on average.
Since we are not aware of any widely available tool for equipartitioning, we also compare SCAF against multiprogramming using unmodified OpenMP, which is the only environment available to end users today. SCAF improves on the unmodified OpenMP runtimes for all five machines, with a mean improvement of 1.08 to 2.07x, depending on the machine, and 1.59x on average.
Funder
NASA Office of the Chief Technologist's Space Technology Research Fellowship
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software
Reference22 articles.
1. The performance of spin lock alternatives for shared-money multiprocessors
2. Scheduler activations
3. Thread scheduling for multiprogrammed multiprocessors
4. Robert D. Blumofe and Dionisios Papadopoulos. 1998. Hood: A User-Level Threads Library for Multiprogrammed Multiprocessors. Technical Report. University of Texas Austin. Robert D. Blumofe and Dionisios Papadopoulos. 1998. Hood: A User-Level Threads Library for Multiprogrammed Multiprocessors. Technical Report. University of Texas Austin.
5. Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning;Lecture Notes in Computer Science;2021
2. SCALO;ACM Transactions on Architecture and Code Optimization;2017-12-31