Affiliation:
1. Massachusetts Institute of Technology, Cambridge, MA
2. University of Southern California, Marina, CA
Abstract
This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption.Adaptive replication eliminates unnecessary replication by dynamically detecting contention at each object to find and replicate only those objects that would otherwise cause synchronization bottlenecks. We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++. Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that uses adaptive replication to efficiently eliminate synchronization bottlenecks.In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. The advantage is a reduction in the frequency with which the computation acquires and releases locks; the potential disadvantage is the introduction of new synchronization bottlenecks caused by increases in the sizes of the critical sections. Because the adaptive replication transformation takes place at lock acquisition sites, there is a synergistic interaction between lock coarsening and adaptive replication. Lock coarsening drives down the overhead of using adaptive replication, and adaptive replication eliminates synchronization bottlenecks associated with the overaggressive use of lock coarsening.Our experimental results show that, for our set of benchmark programs, the combination of lock coarsening and adaptive replication can eliminate synchronization bottlenecks and significantly reduce the synchronization and replication overhead as compared to versions that use none or only one of the transformations.
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. TreadMarks: shared memory computing on networks of workstations
2. Arnold K. and Gosling J. 1996. The Java Programming Language. Addison-Wesley Reading Mass. Arnold K. and Gosling J. 1996. The Java Programming Language. Addison-Wesley Reading Mass.
3. A hierarchical O(NlogN) force calculation algorithm;Barnes J.;Nature,1986
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. M
emo
D
yn;Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques;2018-11
2. POSTER;Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2017-01-26
3. Self-Replicating Objects for Multicore Platforms;ECOOP 2010 – Object-Oriented Programming;2010