Author:
Yang Yiqing,Zhang Guoyin,Wu Yanxia,Zhao Zhixiang,Fu Yan
Abstract
AbstractTop-K and selection operations are critical in data processing and analysis, and their efficient implementation on GPUs is increasingly important due to the growing demands of data analysis. Existing methods, primarily relying on the bucket partition execution model, encounter challenges such as uneven bucket distribution and latency in merging processes. To address these issues, we introduce a novel Split-Bucket Partition (SBP) execution model that specifically addresses these challenges. Additionally, we propose task and control flow optimizations targeted at top-K and selection algorithms, which further contribute to performance improvements. Our optimized algorithms significantly outperform existing approaches, delivering performance gains of up to $$2.3$$
2.3
times and $$1.6$$
1.6
times for different bucket partitioning rules. Our algorithms show robust performance improvements in non-uniform data scenarios, with gains ranging from $$1.9$$
1.9
times to $$15.5$$
15.5
times. However, it should be noted that the SBP model has limitations related to shared memory and register utilization, potentially impacting performance. Tests on TU102 and A100 GPU architectures validate the effectiveness of our approach, achieving a maximum speedup of $$2.9$$
2.9
times. The study suggests that while the SBP model is effective for top-K and selection algorithms, it also holds promise for other computational tasks, setting the stage for future research.
Publisher
Springer Science and Business Media LLC
Reference30 articles.
1. Sioulas P, Chrysogelos P, Karpathiotakis M, Appuswamy R, Ailamaki A (2019) Hardware-Conscious Hash-Joins on GPUs. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 698–709. IEEE
2. Zhao W, Tan S, Li P (2020) SONG: Approximate nearest neighbor search on GPU. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1033–1044. IEEE
3. Ribizel T, Anzt H (2020) Parallel selection on GPUs. Parallel Comput 91:102588
4. Gaihre A, Zheng D, Weitze S, Li L, Song SL, Ding C, Li XS, Liu H (2021) Dr. Top-k: Delegate-Centric Top-k on GPUs, 1–14
5. Skrodzki M (2019) The k-d tree data structure and a proof for neighborhood computation in expected logarithmic time. arXiv preprint arXiv:1903.04936