Abstract
It is well known that Quicksort -- which is commonly considered as one of the fastest in-place sorting algorithms -- suffers in an essential way from branch mispredictions. We present a novel approach to addressing this problem by partially decoupling control from dataflow: in order to perform the partitioning, we split the input into blocks of constant size. Then, all elements in one block are compared with the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons (except for the final Insertionsort). Moreover, we prove that when sorting
n
elements, the average total number of branch mispredictions is at most ϵ
n
log
n
+
O
(
n
) for some small ϵ depending on the block size.
Our experimental results are promising: when sorting random-integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCC implementation of Quicksort (C++ std::sort). Also, for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in a few special cases, such as sorted or almost sorted inputs, can std::sort beat our implementation. Moreover, on random-input permutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.
Finally, we also apply our approach to Quickselect and obtain a speed-up of more than 100% over the GCC implementation (C++ std::nth_element).
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Reference40 articles.
1. Engineering of a Quicksort partitioning algorithm;Abhyankar D.;Journal of Global Research in Computer Science,2011
2. Optimal Partitioning for Dual Pivot Quicksort
3. How Good Is Multi-Pivot Quicksort?
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Data-centric workloads with MPI_Sort;Journal of Parallel and Distributed Computing;2024-05
2. Optimizing the gravitational tree algorithm for many-core processors;Monthly Notices of the Royal Astronomical Society;2023-12-29
3. Billion-scale Detection of Isomorphic Nodes;2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2023-05
4. Parallel Multi-Deque Partition Dual-Deque Merge sorting algorithm using OpenMP;Scientific Reports;2023-04-19
5. These Rows Are Made for Sorting and That’s Just What We’ll Do;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04