Affiliation:
1. University of Illinois at Urbana--Champaign, Urbana, IL, USA
2. Technion---Israel Institute of Technology, Haifa, Israel
Abstract
In multicores, performance-critical synchronization is increasingly performed in a
lock-free
manner using atomic instructions such as CAS or LL/SC. However, when many processors synchronize on the same variable, performance can still degrade significantly. Contending writes get serialized, creating a non-scalable condition. Past proposals that build hardware queues of synchronizing processors do not fundamentally solve this problem---at best, they help to
efficiently serialize
the contending writes.
This paper proposes a novel architecture that
breaks
the serialization of hardware queues and enables the queued processors to perform lock-free synchronization
in parallel
. The architecture, called
CASPAR
, is able to (1) execute the CASes in the queued-up processors in parallel through eager forwarding of expected values, and (2) validate the CASes in parallel and dequeue groups of processors at a time. The result is highly-scalable synchronization. We evaluate CASPAR with simulations of a 64-core chip. Compared to existing proposals with hardware queues, CASPAR improves the throughput of kernels by 32% on average, and reduces the execution time of the sections considered in lock-free versions of applications by 47% on average. This makes these sections 2.5x faster than in the original applications.
Funder
Israel Science Foundation
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference66 articles.
1. The Go Programming Language. http://golang.org 2014. The Go Programming Language. http://golang.org 2014.
2. ]LFapp2NetBSD producer/consumer queue. ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/kern/subr_pcq.c 2014. ]LFapp2NetBSD producer/consumer queue. ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/kern/subr_pcq.c 2014.
3. 8: High Performance C+ FIX Framework. http://fix8.org 2014. 8: High Performance C+ FIX Framework. http://fix8.org 2014.
4. MySQL Concurrent Allocator. https://github.com/twitter/mysql/blob/master/mysys/lf_alloc-pin.c 2014. MySQL Concurrent Allocator. https://github.com/twitter/mysql/blob/master/mysys/lf_alloc-pin.c 2014.
5. Y.
Afek G.
Korland and
E.
Yanovsky
.
Quasi-Linearizability: Relaxed Consistency for Improved Concurrency. In phProceedings of the 14th International Conference On Principles Of Distributed Systems (OPODIS
2010
) volume
6490
of
LNCS pages
395
--
410
. 2010. Y. Afek G. Korland and E. Yanovsky. Quasi-Linearizability: Relaxed Consistency for Improved Concurrency. In phProceedings of the 14th International Conference On Principles Of Distributed Systems (OPODIS 2010) volume 6490 of LNCS pages 395--410. 2010.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Subutai: Speeding Up Legacy Parallel Applications Through Data Synchronization;IEEE Transactions on Parallel and Distributed Systems;2021-05-01
2. Software Data Planes;Proceedings of the ACM Symposium on Cloud Computing;2019-11-20
3. A Message-Passing Microcoded Synchronization for Distributed Shared Memory Architectures;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2019-05