Differentiable Neural Network Pruning to Enable Smart Applications on Microcontrollers-Reference-Cited by-同舟云学术

Differentiable Neural Network Pruning to Enable Smart Applications on Microcontrollers

Published:2022-12-21 Issue:4 Volume:6 Page:1-19
ISSN:2474-9567
Container-title:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
language:en
Short-container-title:Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Author:

Liberis Edgar¹^ORCID,Lane Nicholas D.¹^ORCID

Affiliation:

1. University of Cambridge, Cambridge, UK and Samsung AI Centre Cambridge, Cambridge, UK

Abstract

Wearable, embedded, and IoT devices are a centrepiece of many ubiquitous computing applications, such as fitness tracking, health monitoring, home security and voice assistants. By gathering user data through a variety of sensors and leveraging machine learning (ML), applications can adapt their behaviour: in other words, devices become "smart". Such devices are typically powered by microcontroller units (MCUs). As MCUs continue to improve, smart devices become capable of performing a non-trivial amount of sensing and data processing, including machine learning inference, which results in a greater degree of user data privacy and autonomy, compared to offloading the execution of ML models to another device. Advanced predictive capabilities across many tasks make neural networks an attractive ML model for ubiquitous computing applications; however, on-device inference on MCUs remains extremely challenging. Orders of magnitude less storage, memory and computational ability, compared to what is typically required to execute neural networks, impose strict structural constraints on the network architecture and call for specialist model compression methodology. In this work, we present a differentiable structured pruning method for convolutional neural networks, which integrates a model's MCU-specific resource usage and parameter importance feedback to obtain highly compressed yet accurate models. Compared to related network pruning work, compressed models are more accurate due to better use of MCU resource budget, and compared to MCU specialist work, compressed models are produced faster. The user only needs to specify the amount of available computational resources and the pruning algorithm will automatically compress the network during training to satisfy them. We evaluate our methodology using benchmark image and audio classification tasks and find that it (a) improves key resource usage of neural networks up to 80x; (b) has little to no overhead or even improves model training time; (c) produces compressed models with matching or improved resource usage up to 1.4x in less time compared to prior MCU-specific model compression methods.

Funder

Engineering and Physical Sciences Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Human-Computer Interaction

Link

https://dl.acm.org/doi/pdf/10.1145/3569468

Reference61 articles.

1. Attend and Discriminate

2. Mario Almeida , Stefanos Laskaridis , Stylianos I Venieris , Ilias Leontiadis , and Nicholas D Lane . 2021. DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device. arXiv preprint arXiv:2104.09949 ( 2021 ). Mario Almeida, Stefanos Laskaridis, Stylianos I Venieris, Ilias Leontiadis, and Nicholas D Lane. 2021. DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device. arXiv preprint arXiv:2104.09949 (2021).

3. Amazon. 2022. Echo Dot , gen. 4. Retrieved February 1, 2022 from https://www.amazon.co.uk/all-new-echo-dot-4th-generation-smart-speaker-with-alexa-charcoal/dp/B084DWCZXZ Amazon. 2022. Echo Dot, gen. 4. Retrieved February 1, 2022 from https://www.amazon.co.uk/all-new-echo-dot-4th-generation-smart-speaker-with-alexa-charcoal/dp/B084DWCZXZ

4. Sajid Anwar , Kyuyeon Hwang , and Wonyong Sung . 2015. Structured Pruning of Deep Convolutional Neural Networks. CoRR abs/1512.08571 ( 2015 ). arXiv preprint arXiv:1512.08571 (2015). Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Structured Pruning of Deep Convolutional Neural Networks. CoRR abs/1512.08571 (2015). arXiv preprint arXiv:1512.08571 (2015).

5. ARM mbed. 2022. NUCLEO-F446RE. Retrieved February 1 2022 from https://os.mbed.com/platforms/ST-Nucleo-F446RE/ ARM mbed. 2022. NUCLEO-F446RE. Retrieved February 1 2022 from https://os.mbed.com/platforms/ST-Nucleo-F446RE/

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Flexi-BOPI: Flexible granularity pipeline inference with Bayesian optimization for deep learning models on HMPSoC;Information Sciences;2024-09

2. Physical Reservoir Computing Using van der Waals Ferroelectrics for Acoustic Keyword Spotting;ACS Nano;2024-08-14

3. Resource-Aware Saliency-Guided Differentiable Pruning for Deep Neural Networks;Proceedings of the Great Lakes Symposium on VLSI 2024;2024-06-12

4. AIfES: A Next-Generation Edge AI Framework;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-06

5. Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11