Affiliation:
1. University of California, Davis
2. Goethe-Universität Frankfurt am Main
Abstract
Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous
buckets
or
bins
, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on Graphics Processing Units (GPUs), programmers often choose to implement multisplit with a sort. One way is to first generate an auxiliary array of bucket IDs and then sort input data based on it. In case smaller indexed buckets possess smaller valued keys, another way for multisplit is to directly sort input data. Both methods are inefficient and require more work than necessary: The former requires more expensive data movements while the latter spends unnecessary effort in sorting elements within each bucket. In this work, we provide a parallel model and multiple implementations for the multisplit problem. Our principal focus is multisplit for a small (up to 256) number of buckets. We use warp-synchronous programming models and emphasize warpwide communications to avoid branch divergence and reduce memory usage. We also hierarchically reorder input elements to achieve better coalescing of global memory accesses. On a GeForce GTX 1080 GPU, we can reach a peak throughput of 18.93Gkeys/s (or 11.68Gpairs/s) for a key-only (or key-value) multisplit. Finally, we demonstrate how multisplit can be used as a building block for radix sort. In our multisplit-based sort implementation, we achieve comparable performance to the fastest GPU sort routines, sorting 32-bit keys (and key-value pairs) with a throughput of 3.0Gkeys/s (and 2.1Gpair/s).
Funder
DFG
MADALGO
Sandia LDRD
UC Lab Fees Research Program
NVIDIA Graduate Fellowship
NSF
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献