cuConv: CUDA implementation of convolution for CNN inference-Reference-Cited by-同舟云学术

cuConv: CUDA implementation of convolution for CNN inference

Published:2022-01-21 Issue:2 Volume:25 Page:1459-1473
ISSN:1386-7857
Container-title:Cluster Computing
language:en
Short-container-title:Cluster Comput

Author:

Jordà Marc^ORCID,Valero-Lara Pedro,Peña Antonio J.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

https://link.springer.com/content/pdf/10.1007/s10586-021-03494-y.pdf

Reference34 articles.

1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I.J., Harp, A., Irving, G., Isard, M., Jia, Y., Józefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D.G., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P.A., Vanhoucke, V., Vasudevan, V., Viégas, F.B., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR arXiv:1603.04467 (2016)

2. Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: efficient primitives for deep learning. CoRR (2014)

3. D455, I.R.D.C.: https://www.intelrealsense.com/depth-camera-d455 (2021)

4. Dongarra, J.J., Hammarling, S., Higham, N.J., Relton, S.D., Valero-Lara, P., Zounon, M.: The design and performance of batched BLAS on modern high-performance computing systems. In: International conference on computational science (ICCS), pp. 495–504 (2017)

5. Dryden, N., Maruyama, N., Moon, T., Benson, T., Snir, M., Van Essen, B.: Channel and filter parallelism for large-scale CNN training. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC 2019. Association for computing machinery, New York, NY, USA (2019). https://doi.org/10.1145/3295500.3356207

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions;ACM Transactions on Architecture and Code Optimization;2023-12-14

2. Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

3. ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks;CCF Transactions on High Performance Computing;2023-09-20

4. Explainable Deep-Learning-Based Diagnosis of Alzheimer’s Disease Using Multimodal Input Fusion of PET and MRI Images;Journal of Medical and Biological Engineering;2023-06

5. A large-scale heterogeneous computing framework for non-uniform sampling two-dimensional convolution applications;CCF Transactions on High Performance Computing;2023-05-11