Affiliation:
1. Syracuse Univ., Syracuse, NY
2. Northwestern Univ., Evanston, IL
3. Louisiana State Univ., Baton Rouge
Abstract
Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this article, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and it works on structured programs with conditional statements and nested loops but without arbitrary goto statements.The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distribu-tions, and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing, and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (anaverage of 32% reduction), the volume of the data communicated (an average of 37%reduction), and the execution time (an average of 26% reduction).
Publisher
Association for Computing Machinery (ACM)
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Machine-Learning-Based Framework for Productive Locality Exploitation;IEEE Transactions on Parallel and Distributed Systems;2021-06-01
2. Optimizing Remote Communication in X10;ACM Transactions on Architecture and Code Optimization;2020-01-10
3. Optimizing remote data transfers in X10;Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques;2018-11
4. Maximizing Communication Overlap with Dynamic Program Analysis;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2018-01-28
5. Formalizing Structured Control Flow Graphs;Languages and Compilers for Parallel Computing;2017