Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector-Reference-Cited by-同舟云学术

Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Published:2020-06-14 Issue:12 Volume:10 Page:4091
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Oh Yoo Rhee^ORCID,Park Kiyoung,Park Jeon Gyu

Abstract

This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09–5.41 s by using the proposed deep neural network-based voice activity detector.

Funder

Ministry of Science and ICT, South Korea

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/12/4091/pdf

Reference45 articles.

1. Industrial Technology Advances: Deep Learning—From Speech Recognition to Language and Multimodal Processing;Deng,2016

2. Deep Learning for Environmentally Robust Speech Recognition

3. A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends

4. Speech Recognition Using Deep Neural Networks: A Systematic Review

5. Survey on Deep Neural Networks in Speech and Vision Systems;Alam;arXiv,2019

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speech Recognition Utilizing Deep Learning: A Systematic Review of the Latest Developments;HUM-CENT COMPUT INFO;2024

2. Research on Signal Detection of OFDM Systems Based on the LSTM Network Optimized by the Improved Chameleon Swarm Algorithm;Mathematics;2023-04-23

3. An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network;Multimedia Tools and Applications;2023-02-23

4. A Reconfigurable Gaussian Base Normalization Deep Neural Network Design for an Energy-Efficient Voice Activity Detector;2021 2nd International Conference on Communication, Computing and Industry 4.0 (C2I4);2021-12-16

5. Fast offline transformer‐based end‐to‐end automatic speech recognition for real‐world applications;ETRI Journal;2021-12-08