Author:
Sun Qiao,Ma Wenjing,Sun Jiachang,Li Huiyuan
Funder
National Key R &D Program of China
Publisher
Springer Science and Business Media LLC
Subject
Information Systems,Hardware and Architecture,Computer Science Applications,Computer Science (miscellaneous)
Reference25 articles.
1. Awan, A.A., Hamidouche, K., Venkatesh, A., Panda, D.K.: Efficient large message broadcast using nccl and cuda-aware mpi for deep learning. In: Proceedings of the 23rd European MPI Users’ Group Meeting, pp 15–22 (2016). https://doi.org/10.1145/2966884.2966912
2. Bach, M., Kretz, M., Lindenstruth, V., Rohr, D.: Optimized hpl for amd gpu and multi-core cpu usage. Comput. Sci. 26(3–4), 153–164 (2011). https://doi.org/10.1007/s00450-011-0161-5
3. BLAS: Basic linear algebra subprograms. http://www.netlib.org/blas/ (2021)
4. cuBLAS: the CUDA basic linear algebra subroutine library. https://developer.nvidia.com/cublas (2010)
5. CUDA: Compute unified device architecture. https://developer.nvidia.com/cuda-downloads (2022)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12