Affiliation:
1. INESC-ID, Lisbon, Portugal
Abstract
This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in
O(nm)
time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.
Funder
Fundação para a Ciência e Tecnologia
Austrian Science Fund
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Reference24 articles.
1. Computer Architecture
2. Michael Bader. 2012. Space-filling Curves: An Introduction with aApplications in Scientific Computing. Vol. 9. Springer Science & Business Media.
3. Cache oblivious matrix multiplication using an element ordering based on a Peano curve
4. J. Baert. 2018. Libmorton: C++ Morton encoding/decoding library. Retrieved from https://github.com/Forceflow/libmorton.
5. Cache-oblivious loops based on a novel space-filling curve
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献