1. Fast sparse matrix-vector multiplication on gpus for graph applications;Ashari,2014
2. Memory access patterns: the missing piece of the multi-gpu puzzle;Ben-Nun,2015
3. Cuda based parallel implementations of space-saving on a gpu;Cafaro,2017
4. Parallel space saving on multi-and many-core processors;Cafaro,2017
5. Graph regularized nonnegative matrix factorization for data representation;Cai;IEEE Trans. Pattern Anal. Mach. Intell.,2011