Well-structured futures and cache locality-Reference-Cited by-同舟云学术

Well-structured futures and cache locality

Published:2014-11-26 Issue:8 Volume:49 Page:155-166
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Herlihy Maurice¹,Liu Zhiyu¹

Affiliation:

1. Brown University, Providence, RI, USA

Abstract

In fork-join parallelism , a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create {futures . A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Ω(P T∞ + t T∞) deviations, which implies Ω(C P T∞ + C t T∞) additional cache misses, where C is the number of cache lines, P is the number of processors, t is the number of touches, and T∞ is the computation span . Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a later descendant of the thread that created it, then parallel executions with work stealing can incur at most O(C P T 2 ∞) additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2692916.2555257

Reference22 articles.

1. The data locality of work stealing

2. Adaptive work stealing with parallelism feedback

3. Thread scheduling for multiprogrammed multiprocessors

4. I-structures: data structures for parallel computing

5. Programming parallel algorithms