Abstract
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models.
Funder
Deanship of Scientific Research at Prince Sattam Bin 365 Abdulaziz University
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference54 articles.
1. Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks;Li;IEEE/ACM Trans. Audio Speech Lang. Process.,2016
2. A review of tools and techniques for computer aided pronunciation training (CAPT) in English;Agarwal;Educ. Inf. Technol.,2019
3. Lo, W.K., Zhang, S., and Meng, H. (2010, January 26–30). Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Japan.
4. Harrison, A.M., Lo, W.K., Qian, X.J., and Meng, H. (2009, January 3–5). Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. Proceedings of the International Workshop on Speech and Language Technology in Education, Warwickshire, UK.
5. Qian, X., Soong, F.K., and Meng, H. (2010, January 26–30). Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Japan.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献