A Practical Approach for Employing Tensor Train Decomposition in Edge Devices-Reference-Cited by-同舟云学术

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

Published:2024-02-16 Issue:1-2 Volume:52 Page:20-39
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Kokhazadeh Milad,Keramidas Georgios,Kelefouras Vasilios,Stamoulis Iakovos

Abstract

AbstractDeep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.

Funder

H2020 Affordable5G EU Project

Aristotle University of Thessaloniki

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10766-024-00762-3.pdf

Reference41 articles.

1. Hussain, F., Hussain, R., Hassan, S.A., Hossain, E.: Machine learning in IoT security: current solutions and future challenges. IEEE Commun. Surv. Tutor. 22(3), 1686–1721 (2020). https://doi.org/10.1109/COMST.2020.2986444

2. Saraswat, S., Gupta, H.P., Dutta, T.: A writing activities monitoring system for preschoolers using a layered computing infrastructure. IEEE Sens. J. 20, 3871–3878 (2020). https://doi.org/10.1109/JSEN.2019.2960701

3. Mishra, A., Latorre, J.A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., Micikevicius, P.: Accelerating sparse deep neural networks. arXiv:2104.08378 (2021)

4. Akmandor, A.O., YIN, H., Jha, N.K.: Smart, secure, yet energy-efficient, internet-of-things sensors. IEEE Trans. Multi-Scale Comput. Syst. 4, 914–930 (2018). https://doi.org/10.1109/TMSCS.2018.2864297

5. Long, X., Ben, Z., Liu, Y.: A survey of related research on compression and acceleration of deep neural networks. J. Phys. Conf. Ser. 1213, 052003 (2019). https://doi.org/10.1088/1742-6596/1213/5/052003