DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices-Reference-Cited by-同舟云学术

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Published:2021-04-07 Issue: Volume: Page:
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Stahl Rafael^ORCID,Hoffman Alexander,Mueller-Gritschneder Daniel,Gerstlauer Andreas,Schlichtmann Ulf

Abstract

AbstractPerforming inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

Funder

National Science Foundation

Bundesministerium für Bildung und Forschung

Technische Universität München

Publisher

Springer Science and Business Media LLC

Subject

Information Systems,Theoretical Computer Science,Software

Link

http://link.springer.com/content/pdf/10.1007/s10766-021-00712-3.pdf

Reference29 articles.

1. Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: IEEE/ACM International Symposium on Microarchitecture (2016)

2. Arredondo-Velázquez, M., et al.: A streaming architecture for convolutional neural networks based on layer operations chaining. J. Real Time Image Process. (2020)

3. Ayinde, B.O., Inanc, T., Zurada, J.M.: Redundant feature pruning for accelerated inference in deep neural networks. Neural Netw. 118, 148–158 (2019)

4. Bhattacharya, S., Lane, ND.: Sparsification and separation of deep learning layers for constrained resource inference on wearables. In: ACM Conference on Embedded Network Sensor Systems (2016)

5. Bisschop, J.: AIMMS optimization modeling. Lulu. com (2006)

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DIDS: A distributed inference framework with dynamic scheduling capability;Future Generation Computer Systems;2025-01

2. DIAPASON: Differentiable Allocation, Partitioning and Fusion of Neural Networks for Distributed Inference;2024 Design, Automation & Test in Europe Conference & Exhibition (DATE);2024-03-25

3. Quantized hashing: enabling resource-efficient deep learning models at the edge;International Journal of Information Technology;2024-03-16

4. Edge-assisted federated learning for anomaly detection in diverse IoT network;International Journal of Information Technology;2024-02-15

5. RobustDiCE: Robust and Distributed CNN Inference at the Edge;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22