Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search-Reference-Cited by-同舟云学术

Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

Published:2020-02-15 Issue:2 Volume:9 Page:337
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Jahier Pagliari Daniele^ORCID,Daghero Francesco^ORCID,Poncino Massimo^ORCID

Abstract

Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/9/2/337/pdf

Reference57 articles.

1. Deep Learning;LeCun,2006

2. Attention Is All You Need;Vaswani;arXiv,2017

3. Efficient Processing of Deep Neural Networks: A Tutorial and Survey

4. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets

5. Runtime configurable deep neural networks for energy-accuracy trade-off

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks;ACM Transactions on Embedded Computing Systems;2022-07-31

2. Low-Overhead Early-Stopping Policies for Efficient Random Forests Inference on Microcontrollers;VLSI-SoC: Technology Advancement on SoC Design;2022

3. Event-Tree Based Sequence Mining Using LSTM Deep-Learning Model;Complexity;2021-08-14

4. Energy-efficient deep learning inference on edge devices;Advances in Computers;2021

5. CRIME: Input-Dependent Collaborative Inference for Recurrent Neural Networks;IEEE Transactions on Computers;2020