Affiliation:
1. Lund University, Lund, Sweden
Abstract
Many application areas for embedded systems, such as DSP, media coding, and image processing, are based on stream processing. Stream programs in these areas are often naturally described as graphs, where nodes are computational kernels that send data over the edges. This structure also exhibits large amounts of concurrency, because the kernels can execute independently as long as there are data to process on the edges. The explicit data dependencies also help making efficient sequential implementations of such programs, allowing programs to be more portable between platforms with various degrees of parallelism.
The kernels can be expressed in many different ways; for example, as imperative programs with read and write statements for the communication or as a set of actions that can be performed and conditions for when these actions can be executed. Traditionally, there has been a tension between how the kernels are expressed and how efficiently they can be implemented. There are very efficient implementation techniques for stream programs with restricted expressiveness, such as synchronous dataflow.
In this article, we present a framework for building stream program compilers that we call Tÿcho. At the core of this framework is a common kernel representation, based on a machine model for stream program kernels called
actor machine
, on which transformations and optimizations are performed. Both imperative and action-based kernels are translated to this common representation, making the same optimizations applicable to different kinds of kernels, and even across source language boundaries. An actor machine is described by the steps of execution that a kernel can take, and the conditions for taking them, together with a
controller
that decides how the conditions are tested and the steps are taken.
We outline how kernels of an imperative process language and an action-based language are decomposed and translated to the common kernel representation, and we describe a simple backend that generates sequential C code from this representation. We present optimization heuristics of the decision process in the controller that we evaluate using a few dozen kernels from a video decoder with various degrees of complexity. We also present kernel fusion, by merging the controllers of actor machines, as a way of scheduling kernels on the same processor, which we compare to prior art.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference36 articles.
1. Marianne Baudinet and David MacQueen. 1985. Tree pattern matching for ML. (1985). http://www.smlnj.org/compiler-notes/85-note-baudinet.ps. Marianne Baudinet and David MacQueen. 1985. Tree pattern matching for ML. (1985). http://www.smlnj.org/compiler-notes/85-note-baudinet.ps.
2. Cycle-static dataflow
3. Actor Merging for Dataflow Process Networks
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Hardware and Software Generation from Large Actor Machines in Streaming Applications;Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing;2024-04-08
2. Scalable Actor Networks with CAL;Proceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design;2023-09-21
3. Design Space Exploration for Partitioning Dataflow Program on CPU-GPU Heterogeneous System;Journal of Signal Processing Systems;2023-07-31
4. Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks;Proceedings of the International Conference on Parallel Architectures and Compilation Techniques;2022-10-08
5. Dynamic SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis;Journal of Low Power Electronics and Applications;2022-07-17