Engineering In-place (Shared-memory) Sorting Algorithms

Author:

Axtmann Michael1ORCID,Witt Sascha1ORCID,Ferizovic Daniel1ORCID,Sanders Peter1ORCID

Affiliation:

1. Karlsruhe Institute of Technology, Karlsruhe, Germany

Abstract

We present new sequential and parallel sorting algorithms that now represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. Somewhat surprisingly, part of the speed advantage is due to the additional feature of the algorithms to work in-place, i.e., they do not need a significant amount of space beyond the input array. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our new comparison-based algorithm In-place Parallel Super Scalar Samplesort ( IPS 4 o ) , combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best previous in-place parallel comparison-based sorting algorithms by almost a factor of three. That algorithm also outperforms the best comparison-based competitors regardless of whether we consider in-place or not in-place, parallel or sequential settings. Another surprising result is that IPS 4 o even outperforms the best (in-place or not in-place) integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new In-place Parallel Super Scalar Radix Sort ( IPS 2 Ra ) turns out to be the best algorithm. Claims to have the – in some sense – “best” sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on an extensive experimental study involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the claims made about the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. This is particularly true for integer sorting algorithms giving one reason to prefer comparison-based algorithms for robust general-purpose sorting.

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Reference80 articles.

1. Lars Arge, Michael T. Goodrich, Michael J. Nelson, and Nodari Sitchinava. 2008. Fundamental parallel algorithms for private-cache chip multiprocessors. In 20th Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 197–206. https://doi.org/10.1145/1378533.1378573

2. Martin Aumüller and Nikolaj Hass. 2019. Simple and fast blockquicksort using Lomuto’s partitioning scheme. In 21st Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 15–26. https://doi.org/10.1137/1.9781611975499.2

3. NUMA Array;Axtmann Michael;https://github.com/ips4o/NumaArray,2020

4. (Parallel) Super Scalar Sample Sort;Axtmann Michael;https://github.com/ips4o/ps4o,2020

5. Michael Axtmann, Timo Bingmann, Peter Sanders, and Christian Schulz. 2015. Practical massively parallel sorting. In 27th Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 13–23. https://doi.org/10.1145/2755573.2755595

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Minimum average case time complexity for sorting algorithms;Iran Journal of Computer Science;2023-07-29

2. High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems;Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures;2023-06-17

3. Parallel Multi-Deque Partition Dual-Deque Merge sorting algorithm using OpenMP;Scientific Reports;2023-04-19

4. An Adaptive Replacement Strategy LWIRR for Shared Last Level Cache L3 in Multi-core Processors;Proceedings of Trends in Electronics and Health Informatics;2023

5. Vectorized and performance‐portable quicksort;Software: Practice and Experience;2022-08-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3