Affiliation:
1. School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA
Abstract
Irregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to automatically parallelize such irregular loop nests, even in the presence of recurrences. We describe a simple, general loop flattening transformation, along with new optimizations which make it a viable compiler transformation. A robust recurrence parallelization technique is coupled to the loop flattening transformation, allowing parallelization of segmented reductions, scans, and combining-sends over arbitrary associative operators. We discuss the implementation and performance results of the transformations in a parallelizing Fortran 77 compiler for the Cray C90 supercomputer. In particular, we focus on important sparse matrix-vector multiplication kernels, for one of which we are able to automatically derive an algorithm used by one of the fastest library routines available.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Batchman and Robin: Batched and Non-batched Branching for Interactive ZK;Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security;2023-11-15
2. Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics;Proceedings of the ACM on Programming Languages;2023-10-16
3. Source code transformations and optimizations;Embedded Computing for High Performance;2017
4. Efficient RAM and Control Flow in Verifiable Outsourced Computation;Proceedings 2015 Network and Distributed System Security Symposium;2015