Adaptive work-stealing with parallelism feedback

Author:

Agrawal Kunal1,Leiserson Charles E.1,He Yuxiong2,Hsu Wen Jing2

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, MA

2. Nanyang Technological University

Abstract

Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted processors. We present a randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. Assuming that the job scheduler never allots a job more processors than requested by the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors. We model the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the operating environment as well as to the job scheduler's administrative policies. For example, the job scheduler might make a large number of processors available exactly when the job has little use for them. To analyze the performance of our adaptive thread scheduler under this stringent adversarial assumption, we introduce a new technique called trim analysis, which allows us to prove that our thread scheduler performs poorly on no more than a small number of time steps, exhibiting near-optimal behavior on the vast majority. More precisely, suppose that a job has work T 1 and span T . On a machine with P processors, A-STEAL completes the job in an expected duration of O ( T 1 / + T + L lg P ) time steps, where L is the length of a scheduling quantum, and denotes the O ( T + L lg P )-trimmed availability. This quantity is the average of the processor availability over all time steps except the O ( T + L lg P ) time steps that have the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, < T 1 / T , the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly the length of its span, which is optimal. We measured the performance of A-STEAL on a simulated multiprocessor system using synthetic workloads. For jobs with sufficient parallelism, our experiments confirm that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles. We compared A-STEAL with the ABP algorithm, an adaptive work-stealing thread scheduler developed by Arora et al. [1998] which does not employ parallelism feedback. On moderately to heavily loaded machines with large numbers of processors, A-STEAL typically completed jobs more than twice as quickly as ABP, despite being allotted the same number or fewer processors on every step, while wasting only 10% of the processor cycles wasted by ABP.

Funder

Division of Computer and Network Systems

Advanced Cyberinfrastructure

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 30 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Scheduling Out-Trees Online to Optimize Maximum Flow;Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures;2024-06-17

2. High‐performance extended actors;Software: Practice and Experience;2023-09-16

3. Adaptive scheduling of multiprogrammed dynamic-multithreading applications;Journal of Parallel and Distributed Computing;2022-04

4. Scheduling computations with provably low synchronization overheads;Journal of Scheduling;2021-10-21

5. AMCilk: A Framework for Multiprogrammed Parallel Workloads;2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC);2020-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3