Internally deterministic parallel algorithms can be fast-Reference-Cited by-同舟云学术

Internally deterministic parallel algorithms can be fast

Published:2012-09-11 Issue:8 Volume:47 Page:181-192
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Blelloch Guy E.¹,Fineman Jeremy T.²,Gibbons Phillip B.³,Shun Julian¹

Affiliation:

1. Carnegie Mellon University, Pittsburgh, USA

2. Georgetown University, Washington, D.C., USA

3. Intel Labs, Pittsburgh, Pittsburgh, USA

Abstract

The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as internal determinism , and implies a sequential semantics--- i.e. , considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable. In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as deterministic reservations , and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performance results on a 32-core machine. Perhaps surprisingly, for all problems, our internally deterministic algorithms achieve good speedup and good performance even relative to prior nondeterministic solutions.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2370036.2145840

Reference45 articles.

1. U. Acar G. E. Blelloch and R. Blumofe. The data locality of work stealing. Theory of Computing Systems 35(3) 2002. Springer. U. Acar G. E. Blelloch and R. Blumofe. The data locality of work stealing. Theory of Computing Systems 35(3) 2002. Springer.

2. Weak ordering---a new definition

3. CoreDet

4. Grace

Cited by 82 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi Bucket Queues: Efficient Concurrent Priority Scheduling;Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures;2024-06-17

2. When Is Parallelism Fearless and Zero-Cost with Rust?;Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures;2024-06-17

3. Performance of Text-Independent Automatic Speaker Recognition on a Multicore System;Tsinghua Science and Technology;2024-04

4. Scalable High-Quality Hypergraph Partitioning;ACM Transactions on Algorithms;2024-01-22

5. pGRASS-Solver: A Graph Spectral Sparsification-Based Parallel Iterative Solver for Large-Scale Power Grid Analysis;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023-09