Efficient-Grad: <u>Efficient</u> Training Deep Convolutional Neural Networks on Edge Devices with <u>Grad</u> ient Optimizations-Reference-Cited by-同舟云学术

Efficient-Grad: Efficient Training Deep Convolutional Neural Networks on Edge Devices with Grad ient Optimizations

Published:2022-02-08 Issue:2 Volume:21 Page:1-24
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Hong Ziyang¹^ORCID,Yue C. Patrick¹^ORCID

Affiliation:

1. HKUST-Qualcomm Joint Innovation and Research Laboratory, Hong Kong University of Science and Technology, Hong Kong SAR, China

Abstract

With the prospering of mobile devices, the distributed learning approach, enabling model training with decentralized data, has attracted great interest from researchers. However, the lack of training capability for edge devices significantly limits the energy efficiency of distributed learning in real life. This article describes Efficient-Grad, an algorithm-hardware co-design approach for training deep convolutional neural networks, which improves both throughput and energy saving during model training, with negligible validation accuracy loss. The key to Efficient-Grad is its exploitation of two observations. Firstly, the sparsity has potential for not only activation and weight, but gradients and the asymmetry residing in the gradients for the conventional back propagation (BP). Secondly, a dedicated hardware architecture for sparsity utilization and efficient data movement can be optimized to support the Efficient-Grad algorithm in a scalable manner. To the best of our knowledge, Efficient-Grad is the first approach that successfully adopts a feedback-alignment (FA)-based gradient optimization scheme for deep convolutional neural network training, which leads to its superiority in terms of energy efficiency. We present case studies to demonstrate that the Efficient-Grad design outperforms the prior arts by 3.72x in terms of energy efficiency.

Funder

Hong Kong Research Grants Council under General Research Fund

HKUST-Qualcomm Joint Innovation and Research Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3504034

Reference58 articles.

1. Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs

2. Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu Kim, John Koenig, Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel Moreto, Albert Ou, David A. Patterson, Brian Richards, Colin Schmidt, Stephen Twigg, Huy Vo, and Andrew Waterman. 2016. The Rocket Chip Generator. Technical Report UCB/EECS-2016-17. EECS Department, University of California, Berkeley.

3. Chisel

4. Pierre Baldi, Peter Sadowski, and Zhiqin Lu. 2019. Learning in the machine: Random backpropagation and the deep learning channel. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 6348–6352.

5. Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konečný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards federated learning at scale: System design. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.), Vol. 1. 374–388.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating the energy impact of device parameters for DNN inference on edge;Proceedings of the 14th International Green and Sustainable Computing Conference;2023-10-28

2. Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02

3. Efficient On-device Transfer Learning using Activation Memory Reduction;2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC);2023-09-18

4. Efficient On-Device Training via Gradient Filtering;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06