Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings-Reference-Cited by-同舟云学术

Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

Published:2021-02-03 Issue:2 Volume:12 Page:62
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Emiru Eshete Derb^ORCID,Xiong Shengwu,Li Yaxing,Fesseha Awet^ORCID,Diallo Moussa^ORCID

Abstract

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/12/2/62/pdf

Reference59 articles.

1. Uncertainty weighting and propagation in DNN–HMM-based speech recognition

2. Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features;Network: Computation in Neural Systems;2024-06-04

2. Tigrinya End-to-End Speech Recognition: A Hybrid Connectionist Temporal Classification-Attention Approach;Communications in Computer and Information Science;2024

3. Virtual Speech System Based on Sensing Technology and Teaching Management in Universities;Applied Mathematics and Nonlinear Sciences;2023-12-13

4. Multimodal Learning Analytics: An Overview of the Data Collection Methodology;2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT);2023-10-19

5. Adapting Off-the-Shelf Speech Recognition Systems for Novel Words;Information;2023-03-13