The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures-Reference-Cited by-同舟云学术

The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

Published:2019-06-01 Issue:2 Volume:29 Page:407-419
ISSN:2083-8492
Container-title:International Journal of Applied Mathematics and Computer Science
language:en
Short-container-title:

Author:

Bylina Beata¹,Bylina Jarosław¹

Affiliation:

1. Institute of Mathematics , Marie Curie-Skłodowska University , Pl. M. Curie-Skłodowskiej 5, 20-031 Lublin , Poland

Abstract

Abstract The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.

Publisher

Walter de Gruyter GmbH

Subject

Applied Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.sciendo.com/pdf/10.2478/amcs-2019-0030

Reference22 articles.

1. Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P. and Tomov, S. (2009). Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series180(1): 012037.

2. Amdahl, G.M. (1967). Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the Spring Joint Computer Conference, AFIPS’67 (Spring), Atlantic City, NJ, USA, pp. 483–485.

3. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. and Sorensen, D. (1999). LAPACK Users’ Guide, 3rd Edn., SIAM, Philadelphia, PA.

4. Buttari, A., Langou, J., Kurzak, J. and Dongarra, J. (2009). A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing35(1): 38–53.

5. Bylina, B. (2018). The block WZ factorization, Journal of Computational and Applied Mathematics331(C): 119–132.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Review on Quadrant Interlocking Factorization: WZ andWH Factorization;Journal of the Nigerian Society of Physical Sciences;2023-02-24

2. Loop Selection for Multilevel Nested Loops Using a Genetic Algorithm;Mathematical Problems in Engineering;2021-04-01

3. Optimized Cramerâ€™s Rule in WZ Factorization and Applications;European Journal of Pure and Applied Mathematics;2020-10-31