Author:
Najjar Walid A.,Miller W. Marcus,Wim Böhm A. P.
Abstract
Recent evidence indicates that the exploitation of locality in dataflow programs could have a dramatic impact on performance. The current trend in the design of dataflow processors suggest a synthesis of traditional non-strict fine grain instruction execution and a strict coarse grain execution in order to exploit locality. While an increase in instruction granularity will favor the exploitation of locality within a single execution thread, the resulting grain size may increase latency among execution threads. In this paper, the resulting latency incurred through the partitioning of fine grain instructions to quantify coarse grain input and output latencies using a set of numeric benchmarks. The results offer compelling evidence that the inner loops of a significant number of numeric codes would benefit from coarse grain execution. Based on cluster execution times, more than 60% of the measured benchmarks favor a coarse grain execution. IN 64% of the cases the input latency to the cluster is the same in coarse or fine grain execution modes. The results suggest that the effects of increased instruction granularity on latency is minimal for a high percentage of the measured codes, and in large part is offset by available intra-thread locality. Furthermore, simulation results indicate that strict or non-strict data structure access does not change the basic cluster characteristics.
Publisher
Association for Computing Machinery (ACM)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Memory Space Recycling;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2022-02-24
2. Performance and modularity benefits of message-driven execution;Journal of Parallel and Distributed Computing;2004-04
3. Advances in the dataflow computational model;Parallel Computing;1999-12