Abstract
Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR’s struggle to recognize non-native children’s speech. The main objective of this study is to develop a non-native children’s speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children’s speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children’s L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models.
Subject
General Physics and Astronomy
Reference47 articles.
1. Toward human parity in conversational speech recognition;Xiong;IEEE/ACM Trans. Audio Speech Lang. Process.,2017
2. A comparison between native and non-native speech for automatic speech recognition;Park;J. Acoust. Soc. Am.,2019
3. Pandey, K.K., and Jha, S. Exploring the interrelationship between culture and learning: The case of English as a second language in India. Asian Englishes, 2021.
4. Directions for the future of technology in pronunciation research and teaching;O’Brien;J. Second Lang. Pronunc.,2018
5. Mulholland, M., Lopez, M., Evanini, K., Loukina, A., and Qian, Y. A comparison of ASR and human errors for transcription of non-native spontaneous speech. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献