Author:
Gupta Anoop,Tucker Andrew,Urushibara Shigeru
Abstract
Shared-memory multiprocessors are frequently used as compute servers with multiple parallel applications executing at the same time. In such environments, the efficiency of a parallel application can be significantly affected by the operating system scheduling policy. In this paper, we use detailed simulation studies to evaluate the performance of several different scheduling strategies, These include regular priority scheduling, coscheduling or gang scheduling, process control with processor partitioning, handoff scheduling, and affinity-based scheduling. We also explore tradeoffs between the use of busy-waiting and blocking synchronization primitives and their interactions with the scheduling strategies. Since effective use of caches is essential to achieving high performance, a key focus is on the impact of the scheduling strategies on the caching behavior of the applications.Our results show that in situations where the number of processes exceeds the number of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization. In such situations, use of blocking synchronization primitives can significantly improve performance. Process control and gang scheduling strategies are shown to offer the highest performance, and their performance is relatively independent of the synchronization method used. However, for applications that have sizable working sets that fit into the cache, process control performs better than gang scheduling. For the applications considered, the performance gains due to handoff scheduling and processor affinity are shown to be small.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Inducing Huge Tail Latency on a MongoDB deployment;2023 IEEE International Conference on Cloud Engineering (IC2E);2023-09-25
2. Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models;ACM Transactions on Storage;2023-03-06
3. Fault-Tolerant Network-On-Chip;Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design;2023
4. Fault-Tolerant General Purposed Processors;Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design;2023
5. GPU accelerated Cartesian GRAPPA reconstruction using CUDA;Journal of Magnetic Resonance;2022-04