Affiliation:
1. University of Illinois at Urbana--Champaign, Urbana, IL, USA
2. Technion---Israel Institute of Technology, Haifa, Israel
Abstract
In multicores, performance-critical synchronization is increasingly performed in a
lock-free
manner using atomic instructions such as CAS or LL/SC. However, when many processors synchronize on the same variable, performance can still degrade significantly. Contending writes get serialized, creating a non-scalable condition. Past proposals that build hardware queues of synchronizing processors do not fundamentally solve this problem---at best, they help to
efficiently serialize
the contending writes.
This paper proposes a novel architecture that
breaks
the serialization of hardware queues and enables the queued processors to perform lock-free synchronization
in parallel
. The architecture, called
CASPAR
, is able to (1) execute the CASes in the queued-up processors in parallel through eager forwarding of expected values, and (2) validate the CASes in parallel and dequeue groups of processors at a time. The result is highly-scalable synchronization. We evaluate CASPAR with simulations of a 64-core chip. Compared to existing proposals with hardware queues, CASPAR improves the throughput of kernels by 32% on average, and reduces the execution time of the sections considered in lock-free versions of applications by 47% on average. This makes these sections 2.5x faster than in the original applications.
Funder
Israel Science Foundation
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Reference66 articles.
1. The Go Programming Language. http://golang.org 2014. The Go Programming Language. http://golang.org 2014.
2. ]LFapp2NetBSD producer/consumer queue. ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/kern/subr_pcq.c 2014. ]LFapp2NetBSD producer/consumer queue. ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/kern/subr_pcq.c 2014.
3. 8: High Performance C+ FIX Framework. http://fix8.org 2014. 8: High Performance C+ FIX Framework. http://fix8.org 2014.
4. MySQL Concurrent Allocator. https://github.com/twitter/mysql/blob/master/mysys/lf_alloc-pin.c 2014. MySQL Concurrent Allocator. https://github.com/twitter/mysql/blob/master/mysys/lf_alloc-pin.c 2014.
5. Y.
Afek G.
Korland and
E.
Yanovsky
.
Quasi-Linearizability: Relaxed Consistency for Improved Concurrency. In phProceedings of the 14th International Conference On Principles Of Distributed Systems (OPODIS
2010
) volume
6490
of
LNCS pages
395
--
410
. 2010. Y. Afek G. Korland and E. Yanovsky. Quasi-Linearizability: Relaxed Consistency for Improved Concurrency. In phProceedings of the 14th International Conference On Principles Of Distributed Systems (OPODIS 2010) volume 6490 of LNCS pages 395--410. 2010.