Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control

Author:

Zhang Shiyu,Kong Jianguo,Chen Chao,Li Yabin,Liang Haijun

Abstract

The rise of end-to-end (E2E) speech recognition technology in recent years has overturned the design pattern of cascading multiple subtasks in classical speech recognition and achieved direct mapping of speech input signals to text labels. In this study, a new E2E framework, ResNet–GAU–CTC, is proposed to implement Mandarin speech recognition for air traffic control (ATC). A deep residual network (ResNet) utilizes the translation invariance and local correlation of a convolutional neural network (CNN) to extract the time-frequency domain information of speech signals. A gated attention unit (GAU) utilizes a gated single-head attention mechanism to better capture the long-range dependencies of sequences, thus attaining a larger receptive field and contextual information, as well as a faster training convergence rate. The connectionist temporal classification (CTC) criterion eliminates the need for forced frame-level alignments. To address the problems of scarce data resources and unique pronunciation norms and contexts in the ATC field, transfer learning and data augmentation techniques were applied to enhance the robustness of the network and improve the generalization ability of the model. The character error rate (CER) of our model was 11.1% on the expanded Aishell corpus, and it decreased to 8.0% on the ATC corpus.

Publisher

MDPI AG

Subject

Aerospace Engineering

Reference35 articles.

1. Hidden Markov Models for speech recognition;Technometrics,2012

2. Zweig, G., and Russell, S. (1998, January 26–30). Speech recognition with Dynamic Bayesian Networks. Proceedings of the AAAI-98: Fifteenth National Conference on Artificial Intelligence, Madison, WI, USA.

3. Automatic model complexity control using marginalized discriminative growth functions;IEEE Workshop Autom. Speech Recognit. Underst.,2007

4. Abe, A., Kazumasa, Y., and Seiichi, N. (2015, January 10). Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction. Proceedings of the 16th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2015), Dresden, Germany.

5. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups;IEEE Signal Process. Mag.,2012

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3