Affiliation:
1. RMIT University, Melbourne, Australia
Abstract
Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string-sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache efficient. Burstsort dynamically builds a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce, by up to 37%, cache misses further than did the original burstsort, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Reference19 articles.
1. Aho A. Hopcroft J. E. and Ullman J. D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley Reading MA. Aho A. Hopcroft J. E. and Ullman J. D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley Reading MA.
2. Implementing radixsort
3. On sorting strings in external memory (extended abstract)
4. Sorting by distributive partitioning
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献