Affiliation:
1. Linköping University, Sweden
2. FernUniversität in Hagen, Hagen, Germany
Abstract
Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider
crown scheduling
, a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic
Longest Task, Lowest Group
(LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the
Height
heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.
Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Heuristic Scheduling of Streaming Applications for Energy Efficiency on Heterogeneous Multicores;2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2023-12-17
2. Integrating Energy-Optimizing Scheduling of Moldable Streaming Tasks with Design Space Exploration for Multiple Core Types on Configurable Platforms;Journal of Signal Processing Systems;2022-06-30
3. MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions;2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2022-05
4. Systematic search space design for energy-efficient static scheduling of moldable tasks;Journal of Parallel and Distributed Computing;2022-04
5. ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes;ACM Transactions on Architecture and Code Optimization;2022-03-07