Affiliation:
1. Carnegie Mellon University, Pittsburgh, USA
2. Georgetown University, Washington, D.C., USA
3. Intel Labs, Pittsburgh, Pittsburgh, USA
Abstract
The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a
unique
dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as
internal determinism
, and implies a sequential semantics---
i.e.
, considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable.
In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as
deterministic reservations
, and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performance results on a 32-core machine. Perhaps surprisingly, for all problems, our internally deterministic algorithms achieve good speedup and good performance even relative to prior nondeterministic solutions.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
82 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Multi Bucket Queues: Efficient Concurrent Priority Scheduling;Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures;2024-06-17
2. When Is Parallelism Fearless and Zero-Cost with Rust?;Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures;2024-06-17
3. Performance of Text-Independent Automatic Speaker Recognition on a Multicore System;Tsinghua Science and Technology;2024-04
4. Scalable High-Quality Hypergraph Partitioning;ACM Transactions on Algorithms;2024-01-22
5. pGRASS-Solver: A Graph Spectral Sparsification-Based Parallel Iterative Solver for Large-Scale Power Grid Analysis;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023-09