A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures-Reference-Cited by-同舟云学术

A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures

Published:2013-01 Issue:4 Volume:9 Page:1-20
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Belviranli Mehmet E.¹,Bhuyan Laxmi N.¹,Gupta Rajiv¹

Affiliation:

1. University of California, Riverside

Abstract

Today's heterogeneous architectures bring together multiple general-purpose CPUs and multiple domain-specific GPUs and FPGAs to provide dramatic speedup for many applications. However, the challenge lies in utilizing these heterogeneous processors to optimize overall application performance by minimizing workload completion time. Operating system and application development for these systems is in their infancy. In this article, we propose a new scheduling and workload balancing scheme, HDSS, for execution of loops having dependent or independent iterations on heterogeneous multiprocessor systems. The new algorithm dynamically learns the computational power of each processor during an adaptive phase and then schedules the remainder of the workload using a weighted self-scheduling scheme during the completion phase. Different from previous studies, our scheme uniquely considers the runtime effects of block sizes on the performance for heterogeneous multiprocessors. It finds the right trade-off between large and small block sizes to maintain balanced workload while keeping the accelerator utilization at maximum. Our algorithm does not require offline training or architecture-specific parameters. We have evaluated our scheme on two different heterogeneous architectures: AMD 64-core Bulldozer system with nVidia Fermi C2050 GPU and Intel Xeon 32-core SGI Altix 4700 supercomputer with Xilinx Virtex 4 FPGAs. The experimental results show that our new scheduling algorithm can achieve performance improvements up to over 200% when compared to the closest existing load balancing scheme. Our algorithm also achieves full processor utilization with all processors completing at nearly the same time which is significantly better than alternative current approaches.

Funder

Division of Computing and Communication Foundations

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2400682.2400716

Reference29 articles.

1. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

2. Barker Z. and Prasanna V. 2005. Efficient hardware data mining with the apriori algorithm on fpgas. http://gridsec.usc.edu/files/TR/zbakerUSCfccm05.pdf. 10.1109/FCCM.2005.31 Barker Z. and Prasanna V. 2005. Efficient hardware data mining with the apriori algorithm on fpgas. http://gridsec.usc.edu/files/TR/zbakerUSCfccm05.pdf. 10.1109/FCCM.2005.31

3. A performance study of general-purpose applications on graphics processors using CUDA

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An enhanced meta-heuristic algorithm used for energy conscious priority-based task scheduling problems in heterogeneous multiprocessor systems;Sustainable Computing: Informatics and Systems;2024-09

2. A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment;The Journal of Supercomputing;2024-08-07

3. Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

4. Hardware support for balanced co-execution in heterogeneous processors;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07

5. Research on Scheduling Algorithms for AI Data-Intensive Tasks in Edge Heterogeneous Environments;2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE);2023-12-29