Optimizing Performance of Image Processing Algorithms on GPUs-Reference-Cited by-同舟云学术

Optimizing Performance of Image Processing Algorithms on GPUs

Published:2022 Issue: Volume: Page:936-943
ISSN:1876-1100
Container-title:Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications
language:
Short-container-title:

Author:

Zhou Honghui,Qin Ruyi,Liu Zihan,Qian Ying,Ju Xiaoming

Abstract

AbstractThe application of machine learning algorithms in the field of power grid improves the service level of power enterprises and promotes the development of power grid. NVIDIA Volta and Turing GPUs powered by Tensor Cores can accelerate training and learning performance for these algorithms. With Tensor Cores enabled, FP32 and FP16 mixed precision matrix multiplication dramatically accelerates the throughput and reduces AI training times. In order to explore the cause of this phenomenon, we choose a convolutional neural network (CNN), which is widely used in computer vision, as an example and show the performance characteristics with tensor core on general matrix multiplications and convolution calculations as benchmark. Building a CNN based on cuDNN and TensorFlow, we analyze the performance of CNN from various aspects and optimize performance of it by changing the shape of convolution kernel and using texture memory, etc. The experimental results prove the effectiveness of our methods.

Publisher

Springer Nature Singapore

Link

https://link.springer.com/content/pdf/10.1007/978-981-19-2456-9_95

Reference23 articles.

1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

2. Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

3. Conneau, A., Schwenk, H., Barrault, L., et al.: Very deep convolutional networks for natural language processing. arXiv preprint arXiv:1606.01781, February 2016

4. Segler, M.H.S., Kogej, T., Tyrchan, C., et al.: Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4(1), 120–131 (2017)

5. NVIDIA: Nvidia turing architecture whitepaper. Technical report, NVIDIA Corp., August 2018. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf