Reduced Memory Viterbi Decoding for Hardware-accelerated Speech Recognition-Reference-Cited by-同舟云学术

Reduced Memory Viterbi Decoding for Hardware-accelerated Speech Recognition

Published:2022-05-28 Issue:3 Volume:21 Page:1-18
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Raj Pani Prithvi¹^ORCID,Reddy Pakala Akhil²^ORCID,Chandrachoodan Nitin³^ORCID

Affiliation:

1. Indian Institute of Technology Madras, Madras, India

2. Indian Institute of Technology Madras, Terraces on Brompton, Houston, TX, India

3. Indian Institute of Technology Madras, IIT Madras, India

Abstract

Large Vocabulary Continuous Speech Recognition systems require Viterbi searching through a large state space to find the most probable sequence of phonemes that led to a given sound sample. This needs storing and updating of a large Active State List (ASL) in the on-chip memory (OCM) at regular intervals (called frames), which poses a major performance bottleneck for speech decoding. Most works use hash tables for OCM storage while beam-width pruning to restrict the ASL size. To achieve a decent accuracy and performance, a large OCM, numerous acoustic probability computations, and DRAM accesses are incurred. We propose to use a binary search tree for ASL storage and a max heap data structure to track the worst cost state and efficiently replace it when a better state is found. With this approach, the ASL size can be reduced from over 32K to 512 with minimal impact on recognition accuracy for a 7,000-word vocabulary model. This, combined with a caching technique for acoustic scores, reduced the DRAM data accessed by 31

\( \times \)

and the acoustic probability computations by 26

\( \times \)

. The approach has also been implemented in hardware on a Xilinx Zynq FPGA at 200 MHz using the Vivado SDS compiler. We study the tradeoffs among the amount of OCM used, word error rate, and decoding speed to show the effectiveness of the approach. The resulting implementation is capable of running faster than real time with 91% lesser block-RAMs.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3510028

Reference35 articles.

1. Frustratingly easy noise-aware training of acoustic models;Raj Desh;arXiv:2011.02090,2021

2. UNFOLD

3. A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

4. A Low-Power, High-Performance Speech Recognition Accelerator

5. Design and Evaluation of an Ultra Low-power Human-quality Speech Recognition System

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Locate: Low-Power Viterbi Decoder Exploration using Approximate Adders;Proceedings of the Great Lakes Symposium on VLSI 2023;2023-06-05