Affiliation:
1. IBM T. J. Watson Research Center, Yorktown Heights, NY
2. AT&T Labs—Research, Florham Park, NJ
Abstract
Customizing the protocols that manage accesses to different data structures within an application can improve the performance of software shared-memory programs substantially. Existing systems for using customizable protocols are hard to use directly because the mechanisms they provide for manipulating protocols are low-level ones. This article is an in-depth study of the issues involved in providing language support for application-specific protocols. We describe the design and implementation of a new language for parallel programming, Ace, that integrates support for customizable protocols with minimal extensions to C. Ace applications are developed using a shared-memory model with a default sequentially consistent protocol. Performance can then be optimized, with minor modifications to the application, by experimenting with different protocol libraries. The design of Ace was driven by a detailed study of the use of customizable protocols. We delineate the issues that arise when programming with customizable protocols and present novel abstractions that allow for their easy use. We describe the design and implementation of a runtime system and compiler for Ace nd discuss compiler optimizations that improve the performance of such software shared-memory systems. We study the communication patterns of a set of benchmark applications and consider the use of customizable protocols to optimize their performance. We evaluate the performance of our system through experiments on a Thinking Machine CM-5 and a Cray T3E. We also present measurements that demonstrate that Ace has good performance compared to that of a modern distributed shared-memory system.
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. The MIT Alewife machine: A large-scale distributedmemory multiprocessor. In Scalable Shared Memory Multiprocessors, M. Dubois and S. S. Thakkar, Eds. Kluwer Academic Publishers, Hingham;AGARWAL A.;MA,1992
2. An integrated runtime and compile-time approach for parallelizing structured and block structured applications
3. TreadMarks: shared memory computing on networks of workstations
4. A hierarchical O(NlogN) force calculation algorithm;BARNES J.;Nature,1986
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献