End-to-End Mandarin Speech Recognition Combining CNN and BLSTM-Reference-Cited by-同舟云学术

End-to-End Mandarin Speech Recognition Combining CNN and BLSTM

Published:2019-05-07 Issue:5 Volume:11 Page:644
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Wang Dong^ORCID,Wang Xiaodong,Lv Shaohe

Abstract

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. This is especially common for Mandarin speech recognition. In this paper, we propose a CNN+BLSTM+CTC end-to-end Mandarin ASR. This CNN+BLSTM+CTC ASR uses Convolutional Neural Net (CNN) to learn local speech features, uses Bidirectional Long-Short Time Memory (BLSTM) to learn history and future contextual information, and uses Connectionist Temporal Classification (CTC) for decoding. Our model is completely trained on the by-far-largest open-source Mandarin speech corpus AISHELL-1, using neither any in-house databases nor external language models. Experiments show that our CNN+BLSTM+CTC model achieves a WER of 19.2%, outperforming the exiting best work. Because all the data corpora we used are freely available, our model is reproducible and comparable, providing a new baseline for further Mandarin ASR research.

Funder

Science and Technology on Parallel and Distributed Processing Laboratory Fund

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/11/5/644/pdf

Reference29 articles.

1. DeepSpeech: Scaling up end-to-end speech recognition;Hannun;arXiv,2014

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Customized deep learning based Turkish automatic speech recognition system supported by language model;PeerJ Computer Science;2024-04-03

2. Multi-view Stacked CNN-BiLSTM (MvS CNN-BiLSTM) for urban PM2.5 concentration prediction of India’s polluted cities;Journal of Cleaner Production;2024-03

3. Tigrinya End-to-End Speech Recognition: A Hybrid Connectionist Temporal Classification-Attention Approach;Communications in Computer and Information Science;2024

4. Research on Dialect Protection: Interaction Design of Chinese Dialects Based on BLSTM-CRF and FBM Theories;IEEE Access;2024

5. Efficient Spoken Language Recognition via Multilabel Classification;INTERSPEECH 2023;2023-08-20