LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition-Reference-Cited by-同舟云学术

LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition

Published:2022-05-13 Issue:5 Volume:13 Page:250
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Fu Pengbin,Liu Daxing^ORCID,Yang Huirong

Abstract

Recently, Transformer-based models have shown promising results in automatic speech recognition (ASR), outperforming models based on recurrent neural networks (RNNs) and convolutional neural networks (CNNs). However, directly applying a Transformer to the ASR task does not exploit the correlation among speech frames effectively, leaving the model trapped in a sub-optimal solution. To this end, we propose a local attention Transformer model for speech recognition that combines the high correlation among speech frames. Specifically, we use relative positional embedding, rather than absolute positional embedding, to improve the generalization of the Transformer for speech sequences of different lengths. Secondly, we add local attention based on parametric positional relations to the self-attentive module and explicitly incorporate prior knowledge into the self-attentive module to make the training process insensitive to hyperparameters, thus improving the performance. Experiments carried out on the LibriSpeech dataset show that our proposed approach achieves a word error rate of 2.3/5.5% by language model fusion without any external data and reduces the word error rate by 17.8/9.8% compared to the baseline. The results are also close to, or better than, other state-of-the-art end-to-end models.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/13/5/250/pdf

Reference30 articles.

1. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI

2. End-to-end deep neural network for automatic speech recognition;Song;Standford CS224D Rep.,2015

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Arabic Mispronunciation Recognition System Using LSTM Network;Information;2023-07-16

2. Embedded System Vehicle Based on Multi-Sensor Fusion;IEEE Access;2023