Distributed out-of-memory NMF on CPU/GPU architectures-Reference-Cited by-同舟云学术

Distributed out-of-memory NMF on CPU/GPU architectures

Published:2023-09-08 Issue:3 Volume:80 Page:3970-3999
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Boureima Ismael,Bhattarai Manish,Eren Maksim,Skau Erik,Romero Philip,Eidenbenz Stephan,Alexandrov Boian

Abstract

AbstractWe propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/output latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library (NCCL) based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density

$$10^{-6}$$

10 - 6 .

Funder

U.S. Department of Energy National Nuclear Security Administration

LANL LDRD

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-023-05587-4.pdf

Reference51 articles.

1. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

2. Cichocki A, Zdunek R, Phan AH, Amari S-i (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation

3. Everett B (2013) An introduction to latent variable models

4. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3(1):246–259

5. Alexandrov BS, Alexandrov LB, Iliev F, Stanev VG, Vesselinov V (2020) Source identification by non-negative matrix factorization combined with semi-supervised clustering. Google Patents. US Patent 10,776,718

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Catch'em all: Classification of Rare, Prominent, and Novel Malware Families;2024 12th International Symposium on Digital Forensics and Security (ISDFS);2024-04-29

2. Correction to: Distributed out-of-memory NMF on CPU/GPU architectures;The Journal of Supercomputing;2023-09-28