Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages-Reference-Cited by-同舟云学术

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

Published:2014-10-27 Issue:3 Volume:11 Page:1-25
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Drebes Andi¹,Heydemann Karine¹,Drach Nathalie¹,Pop Antoniu²,Cohen Albert³

Affiliation:

1. Sorbonne Universités, UPMC Univ Paris 06, CNRS, UMR 7606, LIP6, France

2. University of Manchester, School of Computer Science, United Kingdom

3. INRIA and École Normale Supérieure, Paris, France

Abstract

We present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the OpenStream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63× compared to a state-of-the-art work-stealing scheduler.

Funder

Seventh Framework Programme

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2641764

Reference40 articles.

1. The data locality of work stealing

2. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

3. Sorting networks and their applications

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Parallel Mining of High-utility Itemsets on Multicore Processors;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

2. Efficient Parallel Mining of High-utility Itemsets on Multicore Processors;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

3. FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02

4. Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing;IEEE Transactions on Parallel and Distributed Systems;2022-12-01

5. Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing;IEEE Access;2022