Exploiting beam search confidence for energy-efficient speech recognition-Reference-Cited by-同舟云学术

Exploiting beam search confidence for energy-efficient speech recognition

Published:2024-07-15 Issue:17 Volume:80 Page:24908-24937
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Pinto Dennis,Arnau José-María,Riera Marc,Cruz Josep-Llorenç,González Antonio

Abstract

AbstractWith mobile and embedded devices getting more integrated in our daily lives, the focus is increasingly shifting toward human-friendly interfaces, making automatic speech recognition (ASR) a central player as the ideal means of interaction with machines. ASR is essential for many cognitive computing applications, such as speech-based assistants, dictation systems and real-time language translation. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels being achieved, even surpassing human accuracy. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, efficient hardware acceleration is key for achieving acceptable performance. In this paper, we propose a technique to improve the energy efficiency and performance of ASR systems, focusing on low-power hardware for edge devices. We focus on optimizing the DNN-based acoustic model evaluation, as we have observed it to be the main bottleneck in popular ASR systems, by leveraging run-time information from the beam search. By doing so, we reduce energy and execution time of the acoustic model evaluation by 25.6 and 25.9 %, respectively, with negligible accuracy loss.

Funder

CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020

Spanish MICINN Ministry

Spanish State Research Agency

Catalan Agency for University and Research

ICREA Academia

Universitat Politècnica de Catalunya

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11227-024-06351-y.pdf

Reference81 articles.

1. Alharbi S, Alrazgan M, Alrashed A et al (2021) Automatic speech recognition: systematic literature review. IEEE Access 9:131858–131876

2. Amazon (2014) Alexa. https://en.wikipedia.org/wiki/Amazon_Alexa, [Online; accessed 22-Mar-2024]

3. Amodei D, Ananthanarayanan S, Anubhai R, et al (2016) Deep speech 2: End-to-end speech recognition in english and mandarin. In: International Conference on Machine Learning, pp 173–182

4. Apple (2011) Siri. https://en.wikipedia.org/wiki/Siri, [Online; accessed 22-Mar-2024]

5. Baevski A, Zhou Y, Mohamed A et al (2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449–12460