Abstract
AbstractDeep neural networks (DNNs) have made significant achievements in a wide variety of domains. For the deep learning tasks, multiple excellent hardware platforms provide efficient solutions, including graphics processing units (GPUs), central processing units (CPUs), field programmable gate arrays (FPGAs), and application-specific integrated circuit (ASIC). Nonetheless, CPUs outperform other solutions including GPUs in many cases for the inference workload of DNNs with the support of various techniques, such as the high-performance libraries being the basic building blocks for DNNs. Thus, CPUs have been a preferred choice for DNN inference applications, particularly in the low-latency demand scenarios. However, the DNN inference efficiency remains a critical issue, especially when low latency is required under conditions with limited hardware resources, such as embedded systems. At the same time, the hardware features have not been fully exploited for DNNs and there is much room for improvement. To this end, this paper conducts a series of experiments to make a thorough study for the inference workload of prominent state-of-the-art DNN architectures on a single-instruction-multiple-data (SIMD) CPU platform, as well as with widely applicable scopes for multiple hardware platforms. The study goes into depth in DNNs: the CPU kernel-instruction level performance characteristics of DNNs including branches, branch prediction misses, cache misses, etc, and the underlying convolutional computing mechanism at the SIMD level; The thorough layer-wise time consumption details with potential time-cost bottlenecks; And the exhaustive dynamic activation sparsity with exact details on the redundancy of DNNs. The research provides researchers with comprehensive and insightful details, as well as crucial target areas for optimising and improving the efficiency of DNNs at both the hardware and software levels.
Funder
japan society for the promotion of science
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics
Reference80 articles.
1. Agarap Abien Fred (2018) Deep learning using rectified linear units (relu). CoRR, abs/1803.08375
2. Ahn B, Kim T (2022) Deeper weight pruning without accuracy loss in deep neural networks: Signed-digit representation-based approach. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):656–668
3. Bochkovskiy Alexey, Wang Chien-Yao, Liao Hong-Yuan Mark (2020) Yolov4: Optimal speed and accuracy of object detection,
4. Camci E, Gupta M, Wu M, Lin J (2022) Qlp: Deep q-learning for pruning deep neural networks. IEEE Transactions on Circuits and Systems for Video Technology
5. Cardoso VB, Oliveira AS, Forechi A, Azevedo P, Mutz F, Oliveira-Santos T, Badue C, De Souza AF (2020) A large-scale mapping method based on deep neural networks applied to self-driving car localization, pp 1–8
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献