A sound and complete abstraction for reasoning about parallel prefix sums

Author:

Chong Nathan1,Donaldson Alastair F.1,Ketema Jeroen1

Affiliation:

1. Imperial College London, London, United Kingdom

Abstract

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n 2 \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n 2 ) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference35 articles.

1. A. Betts N. Chong A. F. Donaldson S. Qadeer and P. Thomson. GPUVerify: a verifier for GPU kernels. In phOOPSLA pages 113--132 2012. 10.1145/2384616.2384625 A. Betts N. Chong A. F. Donaldson S. Qadeer and P. Thomson. GPUVerify: a verifier for GPU kernels. In phOOPSLA pages 113--132 2012. 10.1145/2384616.2384625

2. M. Billeter O. Olsson and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In phHPG pages 159--166 2009. 10.1145/1572769.1572795 M. Billeter O. Olsson and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In phHPG pages 159--166 2009. 10.1145/1572769.1572795

3. Scans as primitive parallel operations

4. A Regular Layout for Parallel Adders

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. AuDaLa is Turing Complete;Lecture Notes in Computer Science;2024

2. An Autonomous Data Language;Theoretical Aspects of Computing – ICTAC 2023;2023

3. Formal Verification of Parallel Prefix Sum;Lecture Notes in Computer Science;2020

4. Formal Verification of Parallel Stream Compaction and Summed-Area Table Algorithms;Theoretical Aspects of Computing – ICTAC 2020;2020

5. HLS-based optimization and design space exploration for applications with variable loop bounds;Proceedings of the International Conference on Computer-Aided Design;2018-11-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3