Affiliation:
1. Computer Systems Laboratory, Stanford University, CA
Abstract
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference33 articles.
1. Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors
2. A. V. Aho R. Sethi and J. D. Ullman. Compilers: Principles Techniques and Tools. Addison-Wesley Reading MA second edition 1986.]] A. V. Aho R. Sethi and J. D. Ullman. Compilers: Principles Techniques and Tools. Addison-Wesley Reading MA second edition 1986.]]
3. Global optimizations for parallelism and locality on scalable parallel machines
4. Optimizing Parallel Programs Using Affinity Regions
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献