Abstract
AbstractWhen dealing with large-scale applications, the availability of simple and efficient algorithms is essential. We focus on the algorithm for calculating the order statistics, i.e. for selecting the kth smallest element of an array X. Many statistical procedures rely on this basic operation, that is usually solved by sorting all the elements and selecting the one in position k. If the dimension of the array to sort is quite large, this simple operation can become excessively time consuming. For this purpose, we propose an original randomised algorithm that reduces the dimension of the selection problem by focusing only on a small subset of elements that contains the solution. Despite its random nature, it always returns the target value. Empirical results shows that, for arrays of dimensions running from $$10^5$$
10
5
to $$10^8$$
10
8
, our procedure resulted to be remarkably (up to almost 10 times) faster than the naïve procedure, independently of the programming environment and of the sorting algorithm, and with a relative advantage that tends to growth with the dimension of the array.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability
Reference25 articles.
1. Atkinson AC, Riani M, Cerioli A (2010) The forward search: theory and data analysis. J Korean Stat Soc 39(2):117–134
2. Azzini I, Perrotta D, Torti F (2023) A practically efficient fixed-pivot selection algorithm and its extensible MATLAB suite. arXiv:2302.05705
3. Blum M, Floyd RW, Pratt VR, Rivest RL, Tarjan RE et al (1973) Time bounds for selection. J Comput Syst Sci 7(4):448–461
4. Cerasa A (2022) Introducing robust statistics in the uncertainty quantification of nuclear safeguards measurements. Entropy 24(8):1160
5. David HA, Nagaraja HN (2004) Order statistics. Wiley, New Jersey