AI-Driven Performance Modeling for AI Inference Workloads-Reference-Cited by-同舟云学术

AI-Driven Performance Modeling for AI Inference Workloads

Published:2022-07-26 Issue:15 Volume:11 Page:2316
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Sponner Max^ORCID,Waschneck Bernd,Kumar Akash^ORCID

Abstract

Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters, but also to the local devices. Although these are mostly limited to inference tasks, it still widens the range of possible target architectures significantly. Additionally, these new targets usually come with drastically reduced computation performance and memory sizes compared to the traditionally used architectures—and put the key optimization focus on the efficiency as they often depend on batteries. To help developers quickly estimate the performance of a neural network during its design phase, performance models could be used. However, these models are expensive to implement as they require in-depth knowledge about the hardware architecture and the used algorithms. Although AI-based solutions exist, these either require large datasets that are difficult to collect on the low-performance targets and/or limited to a small number of target platforms and metrics. Our solution exploits the block-based structure of neural networks, as well as the high similarity in the typically used layer configurations across neural networks, enabling the training of accurate models on significantly smaller datasets. In addition, our solution is not limited to a specific architecture or metric. We showcase the feasibility of the solution on a set of seven devices from four different hardware architectures, and with up to three performance metrics per target—including the power consumption and memory footprint. Our tests have shown that the solution achieved an error of less than 1 ms (2.6%) in latency, 0.12 J (4%) in energy consumption and 11 MiB (1.5%) in memory allocation for the whole network inference prediction, while being up to five orders of magnitude faster than a benchmark.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/15/2316/pdf

Reference55 articles.

1. A Technical Overview of Cortex-M55 and Ethos-U55: Arm’s Most Capable Processors for Endpoint AI

2. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

3. Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

4. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs;Lai;arXiv,2018

5. Pulp-nn: A computing library for quantized neural network inference at the edge on risc-v based parallel ultra low power clusters;Garofalo;Proceedings of the 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS),2019

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SLAPP: Subgraph-level attention-based performance prediction for deep learning models;Neural Networks;2024-02

2. Role of Artificial Intelligence and Internet of Things in Neurodegenerative Diseases;Studies in Computational Intelligence;2024

3. DIPPM: A Deep Learning Inference Performance Predictive Model Using Graph Neural Networks;Euro-Par 2023: Parallel Processing;2023