Exploiting beam search confidence for energy-efficient speech recognition

Author:

Pinto Dennis,Arnau José-María,Riera Marc,Cruz Josep-Llorenç,González Antonio

Abstract

AbstractWith mobile and embedded devices getting more integrated in our daily lives, the focus is increasingly shifting toward human-friendly interfaces, making automatic speech recognition (ASR) a central player as the ideal means of interaction with machines. ASR is essential for many cognitive computing applications, such as speech-based assistants, dictation systems and real-time language translation. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels being achieved, even surpassing human accuracy. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, efficient hardware acceleration is key for achieving acceptable performance. In this paper, we propose a technique to improve the energy efficiency and performance of ASR systems, focusing on low-power hardware for edge devices. We focus on optimizing the DNN-based acoustic model evaluation, as we have observed it to be the main bottleneck in popular ASR systems, by leveraging run-time information from the beam search. By doing so, we reduce energy and execution time of the acoustic model evaluation by 25.6  and 25.9 %, respectively, with negligible accuracy loss.

Funder

CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020

Spanish MICINN Ministry

Spanish State Research Agency

Catalan Agency for University and Research

ICREA Academia

Universitat Politècnica de Catalunya

Publisher

Springer Science and Business Media LLC

Reference81 articles.

1. Alharbi S, Alrazgan M, Alrashed A et al (2021) Automatic speech recognition: systematic literature review. IEEE Access 9:131858–131876

2. Amazon (2014) Alexa. https://en.wikipedia.org/wiki/Amazon_Alexa, [Online; accessed 22-Mar-2024]

3. Amodei D, Ananthanarayanan S, Anubhai R, et al (2016) Deep speech 2: End-to-end speech recognition in english and mandarin. In: International Conference on Machine Learning, pp 173–182

4. Apple (2011) Siri. https://en.wikipedia.org/wiki/Siri, [Online; accessed 22-Mar-2024]

5. Baevski A, Zhou Y, Mohamed A et al (2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449–12460

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3