Affiliation:
1. University of Toronto, Toronto, Canada
Abstract
The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a chip. Mid- and large-scale systems will have multiple processing chips and hence consist of an SMP-CMP-SMT configuration with non-uniform data sharing overheads. Current operating system schedulers are not aware of these new cache organizations, and as a result, distribute threads across processors in a way that causes many unnecessary, long-latency cross-chip cache accesses.
In this paper we describe the design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance monitoring units (PMUs) available in today's processing units. The primary advantage of using the PMU infrastructure is that it is fine-grained (down to the cache line) and has relatively low overhead. We have implemented our scheme in Linux running on an 8-
way
Power5 SMP-CMP-SMT multi-processor. For commercial multithreaded server workloads (VolanoMark, SPECjbb, and RUBiS), we are able to demonstrate reductions in cross-chip cache accesses of up to 70%. These reductions lead to application-reported performance improvements of up to 7%.
Publisher
Association for Computing Machinery (ACM)
Cited by
76 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Integration Framework for Online Thread Throttling with Thread and Page Mapping on NUMA Systems;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27
2. A Novel Priority Based Scheduler for Asymmetric Multi-core Edge Computing;Communications in Computer and Information Science;2024
3. Optimizing Single-Source Graph Execution on NUMA Machines;2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC);2023-11-21
4. CPS: A Cooperative Para-virtualized Scheduling Framework for Manycore Machines;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4;2023-03-25
5. SLITS: Sparsity-Lightened Intelligent Thread Scheduling;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2023-02-27