Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices-Reference-Cited by-同舟云学术

Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices

Published:2017 Issue: Volume:108 Page:1008-1018
ISSN:1877-0509
Container-title:Procedia Computer Science
language:en
Short-container-title:Procedia Computer Science

Author:

Dong Tingxing,Haidar Azzam,Tomov Stanimire,Dongarra Jack

Publisher

Elsevier BV

Subject

General Engineering

Reference16 articles.

1. A. Abdelfattah, A. Haidar, S. Tomov, and J. J. Dongarra. Performance, design, and autotuning of batched GEMM for gpus. In High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, pages 21–38, 2016.

2. L. Brown. Accelerate machine learning with the cudnn deep neural network library, 2015. at http://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/

3. N. Corporation. https://devtalk.nvidia.com/default/topic/527289/help-with-gpu-cholesky-factorization-/.

4. T. Dong, V. Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra. A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU. In IEEE 28th International Parallel Distributed Processing Symposium (IPDPS), 2014.

5. T. Dong, A. Haidar, P. Luszczek, A. Harris, S. Tomov, and J. Dongarra. LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU. In 16th IEEE International Conference on High Performance and Communications (HPCC 2014), August 2014.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Modern Generative Programming for Optimizing Small Matrix-Vector Multiplication;2018 International Conference on High Performance Computing & Simulation (HPCS);2018-07

2. Accelerating the SVD bi-diagonalization of a batch of small matrices using GPUs;Journal of Computational Science;2018-05

3. Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters;2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2018-05

4. Implicit iterative schemes based on singular decomposition and regularizing algorithms;Вестник Самарского государственного технического университета. Серия «Физико-математические науки»;2018

5. Optimization of Hierarchical Matrix Computation on GPU;Supercomputing Frontiers;2018