Affiliation:
1. University of Washington
Abstract
There is a tension between an imperative style for control flow that has been shown to be easier to use, especially for novices, and a functional style for control flow that better exposes optimization opportunities, thereby making the optimizers more capable. The authors of "Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance" propose Mitos, a program rewriting framework that achieves the best of both worlds by borrowing program analysis concepts from compilers and lifting them to the distributed dataflow regime. Dataflow systems require significant data movement during processing, which can be highly redundant and wasteful in the context of iteration: naive execution plans can reprocess the same massive dataset on each iteration, and iteration i+1 must wait until iteration i is finished. The authors design a mechanism for labeling each intermediate result with its execution path, allowing the system to simultaneously manage complex branching situations while also implementing efficient processing via loop pipelining, all by reasoning about and comparing execution paths.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Reference3 articles.
1. Blazes: Coordination analysis for distributed programs
2. Y. Bu , B. Howe , M. Balazinska , and M. D. Ernst . The haloop approach to large-scale iterative data analysis. VLDB J. , 21 ( 2 ): 169 -- 190 , 2012 . Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. The haloop approach to large-scale iterative data analysis. VLDB J., 21(2):169--190, 2012.
3. Naiad