An architecture-level analysis on deep learning models for low-impact computations-Reference-Cited by-同舟云学术

An architecture-level analysis on deep learning models for low-impact computations

Published:2022-06-26 Issue:3 Volume:56 Page:1971-2010
ISSN:0269-2821
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Li Hengyi^ORCID,Wang Zhichen,Yue Xuebin,Wang Wenwen,Tomiyama Hiroyuki,Meng Lin

Abstract

AbstractDeep neural networks (DNNs) have made significant achievements in a wide variety of domains. For the deep learning tasks, multiple excellent hardware platforms provide efficient solutions, including graphics processing units (GPUs), central processing units (CPUs), field programmable gate arrays (FPGAs), and application-specific integrated circuit (ASIC). Nonetheless, CPUs outperform other solutions including GPUs in many cases for the inference workload of DNNs with the support of various techniques, such as the high-performance libraries being the basic building blocks for DNNs. Thus, CPUs have been a preferred choice for DNN inference applications, particularly in the low-latency demand scenarios. However, the DNN inference efficiency remains a critical issue, especially when low latency is required under conditions with limited hardware resources, such as embedded systems. At the same time, the hardware features have not been fully exploited for DNNs and there is much room for improvement. To this end, this paper conducts a series of experiments to make a thorough study for the inference workload of prominent state-of-the-art DNN architectures on a single-instruction-multiple-data (SIMD) CPU platform, as well as with widely applicable scopes for multiple hardware platforms. The study goes into depth in DNNs: the CPU kernel-instruction level performance characteristics of DNNs including branches, branch prediction misses, cache misses, etc, and the underlying convolutional computing mechanism at the SIMD level; The thorough layer-wise time consumption details with potential time-cost bottlenecks; And the exhaustive dynamic activation sparsity with exact details on the redundancy of DNNs. The research provides researchers with comprehensive and insightful details, as well as crucial target areas for optimising and improving the efficiency of DNNs at both the hardware and software levels.

Funder

japan society for the promotion of science

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10462-022-10221-5.pdf

Reference80 articles.

1. Agarap Abien Fred (2018) Deep learning using rectified linear units (relu). CoRR, abs/1803.08375

2. Ahn B, Kim T (2022) Deeper weight pruning without accuracy loss in deep neural networks: Signed-digit representation-based approach. IEEE Trans Comput Aided Des Integr Circuits Syst 41(3):656–668

3. Bochkovskiy Alexey, Wang Chien-Yao, Liao Hong-Yuan Mark (2020) Yolov4: Optimal speed and accuracy of object detection,

4. Camci E, Gupta M, Wu M, Lin J (2022) Qlp: Deep q-learning for pruning deep neural networks. IEEE Transactions on Circuits and Systems for Video Technology

5. Cardoso VB, Oliveira AS, Forechi A, Azevedo P, Mutz F, Oliveira-Santos T, Badue C, De Souza AF (2020) A large-scale mapping method based on deep neural networks applied to self-driving car localization, pp 1–8

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A new paradigm in cigarette smoke detection: Rapid identification technique based on ATR-FTIR spectroscopy and GhostNet-α;Microchemical Journal;2024-10

2. Neuron Efficiency Index: An Empirical Method for Optimizing Parameters in Deep Learning;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

3. Optimization and Deployment of DNNs for RISC-V-based Edge AI;2024 IEEE International Conference on Real-time Computing and Robotics (RCAR);2024-06-24

4. Personalized Gait Generation Using Convolutional Neural Network for Lower Limb Rehabilitation Robots;2024 IEEE International Conference on Real-time Computing and Robotics (RCAR);2024-06-24

5. YOLO-SM: A Lightweight Single-Class Multi-Deformation Object Detection Network;IEEE Transactions on Emerging Topics in Computational Intelligence;2024-06