DaCapo: An On-Device Learning Scheme for Memory-Constrained Embedded Systems-Reference-Cited by-同舟云学术

DaCapo: An On-Device Learning Scheme for Memory-Constrained Embedded Systems

Published:2023-09-09 Issue:5s Volume:22 Page:1-23
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Khan Osama¹^ORCID,Park Gwanjong¹^ORCID,Seo Euiseong¹^ORCID

Affiliation:

1. Sungkyunkwan University, Republic of Korea

Abstract

The use of deep neural network (DNN) applications in microcontroller unit (MCU) embedded systems is getting popular. However, the DNN models in such systems frequently suffer from accuracy loss due to the dataset shift problem. On-device learning resolves this problem by updating the model parameters on-site with the real-world data, thus localizing the model to its surroundings. However, the backpropagation step during on-device learning requires the output of every layer computed during the forward pass to be stored in memory. This is usually infeasible in MCU devices as they are equipped only with a few KBs of SRAM. Given their energy limitation and the timeliness requirements, using flash memory to store the output of every layer is not practical either. Although there have been proposed a few research results to enable on-device learning under stringent memory conditions, they require the modification of the target models or the use of non-conventional gradient computation strategies. This paper proposes DaCapo, a backpropagation scheme that enables on-device learning in memory-constrained embedded systems. DaCapo stores only the output of certain layers, known as checkpoints, in SRAM, and discards the others. The discarded outputs are recomputed during backpropagation from the nearest checkpoint in front of them. In order to minimize the recomputation occurrences, DaCapo optimally plans the checkpoints to be stored in the SRAM area at a particular phase of the backpropagation and thus replaces the checkpoints stored in memory as the backpropagation progresses. We implemented the proposed scheme in an STM32F429ZI board and evaluated it with five representative DNN models. Our evaluation showed that DaCapo improved backpropagation time by up to 22% and saved energy consumption by up to 28% in comparison to AIfES, a machine learning platform optimized for MCU devices. In addition, our proposed approach enabled the training of MobileNet, which the MCU device had been previously unable to train.

Funder

Institute of Information and Communications Technology Planning and Evaluation

Development of Core Technology for Autonomous Energy-driven Computing System SW in Power-Instable Environment

National Research Foundation of Korea

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3609121

Reference40 articles.

1. 2022. AIfES: Aritifical Intelligence for Embedded Systems. https://github.com/Fraunhofer-IMS/AIfES_for_Arduino

2. 2022. FOMO: Faster Objects More Objects. https://docs.edgeimpulse.com/docs/edge-impulse-studio/learning-blocks/object-detection/fomo-object-detection-for-constrained-devices

3. MLPerf tiny benchmark;Banbury Colby;Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks,2021

4. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers;Banbury Colby;Proceedings of Machine Learning and Systems,2021

5. Stable Electromyographic Sequence Prediction During Movement Transitions using Temporal Convolutional Networks

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. On-device Online Learning and Semantic Management of TinyML Systems;ACM Transactions on Embedded Computing Systems;2024-06-10